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SECTION  I 


INTRODUCTION 

The  geometry  of  the  engagement  of  an  air-to-air  missile  guidance  scenario 
involves  several  elements  that  make  the  problem  difficult  to  solve  using 
straightforward  techniques.  The  problem  involves  highly  nonlinear  geometry  as 
well  as  nonlinear  guidance  and  control  models.  The  scenario  involves 
uncertainties  in  the  trajectory  of  the  maneuvering  target  as  well  as  in  the 
magnitude  and  type  of  the  maneuvers  that  make  the  tracking  problem  more 
complex.  Finally,  the  digital  implementation  of  the  filtering  and  guidance 
algorithms  have  to  reside  in  an  airborne  computer  and  hence  problems  of 
quantization  errors  need  to  be  accounted  for  in  the  algorithms. 

The  objectives  of  this  work  are  to  develop  an  approximate  model  for 
nonlinear  dynamic  systems  that  can  serve  as  a  model  for  the  air-to-air 
engagement  scenario.  Such  a  model  is  based  on  piecewise  linear  approximation 
of  the  nonlinearities  and  then  the  resulting  model  is  further  approximated  by 
a  set  of  linear  models  which  are  switched  according  to  a  Markov  law.  Such  a 
model  is  then  used  to  derive  an  implementable  nonlinear  filtering  scheme  for 
the  tracking  of  such  targets.  This  segment  of  the  work  represents  the  major 
part  of  this  report.  The  extension  of  such  models  to  higher  dimensions  and  to 
practical  scenarios  are  under  continuing  investigation. 

A  second  major  segment  deals  in  the  finite  word-length  implementation  of 
the  resulting  filters.  An  error  model  for  such  filters  is  derived  that 
considers  the  quantization  effects.  Two  aspects  of  the  model  are  considered: 
The  first  derives  bounds  on  the  errors  due  to  the  quantization,  and  the  second 
derives  corrections  to  the  filters  to  improve  the  performance  subject  to  the 
quantized  implementations.  A  systematic  design  procedure  for  such  systems  is 
under  continuing  study. 

Since  the  two  approximations  described  above  involve  both  quantization  and 
systems  that  exhibit  switched  behavior  among  linear  models,  the  third  part  of 
the  report  is  concerned  with  the  general  behavior  of  quantized  or  switched 
systems.  These  may  be  modeled  by  what  is  known  as  hybrid  system  models. 
Since  the  original  approximation  of  the  nonlinear  model  involves  switching 
behavior  that  exhibits  fast  and  slow  dynamics,  this  research  concentrated  on 
two  general  aspects  of  the  models.  First,  general  properties  of  hybrid 
systems  for  estimation  and  control  purposes  were  derived.  Second,  the  study 
of  such  systems  when  subjected  to  slow  and  fast  dynamics  has  allowed  the 
derivation  of  decoupled  multiple  time-scale  implementations  of  controllers  for 
such  systems. 

The  results  derived  in  these  three  parts  of  the  project  will  be  described 
in  the  body  of  the  report,  with  the  major  derivations  given  in  the  Appendices. 


SECTION  II 


MARKOV  APPROXIMATION  FILTERS 

The  primary  objective  of  the  research  was  to  develop  approximate  filtering 
schemes  for  nonlinear  dynamic  models.  The  basic  approach  is  based  on 
approximating  the  nonlinear  dynamics  by  piecewise  linear  segments.  Such  an 
approximation  can  be  made  arbitrarily  accurate  as  the  number  of  segments  is 
increased.  The  next  step  in  the  approximation  is  to  assume  that  the 
transition  from  one  linear  segment  to  another  is  based  on  a  Markov  transition 
law.  In  addition  it  is  assumed  that  each  linear  submodel  extends  over  the 
entire  space  so  that  we  obtain  a  switched  Markov  linear  approximation.  This 
approximation  is  then  used  to  develop  implementable  nonlinear  filtering 
structure. 

The  research  on  the  new  filter  structure  may  be  divided  into  three  primary 
phases.  In  the  first  phase,  the  mathematical  theory  associated  with  the 
switched  Markov  approximation  to  piecewise  linear  systems  was  developed,  and 
all  necessary  constraints  and  approximations  were  clarified.  References  1  and 
2.  These  details  are  given  in  Appendix  A  and  B.  The  extension  of  the  model 
to  higher  dimensions  is  given  in  Reference  3  (see  Appendix  C).  In  the  next 
phase  of  the  research,  the  basic  structure  of  the  new  filtering  scheme  was 
developed.  References  4  and  5  (see  Appendix  D  and  E).  During  the  final  and 
most  recent  phase,  attention  has  been  directed  toward  improving  the 
performance  of  the  basic  structure  (albeit  at  the  expense  of  complexity). 
Reference  6  and  is  given  in  Appendix  F.  Throughout  the  project,  the 
development  of  Monte-Carlo  digital  computer  simulations  has  accompanied  the 
theoretical  research. 

Given  the  basic  underlying  piecewise  linear  model  (or  any  model  that  may 
be  accurately  modelled  as  piecewise  linear),  the  basic  filter  is  developed  by 
assuming  that  the  single  (N- segment)  piecewise  linear  system  may  be 
approximated  by  N  linear  models  with  each  (time  series)  sample  chosen  at 
random  from  one  of  the  N  systems  running  in  parallel  (the  switched  Markov 
model).  The  assumption  of  Markov  switching  leads  to  a  well  known  optimal 
filtering  scheme  which  is  asymptotically  unrealizable  due  to  its  exponentially 
increasing  complexity  with  time.  The  research  performed  under  this  contract 
sought  to  develop  a  new  realizable  (albeit  suboptimal)  filtering  scheme  based 
on  the  optimal  result.  In  essence,  the  goal  was  to  decide  how  best  to  reduce 
the  required  computational  complexity  of  the  optimal  filter,  while  still 
preserving  its  basic  structure. 

In  phase  two  of  the  research,  a  structure  was  developed  where  a  unique 
consistency  update  was  devised  whereby  the  estimates  of  filters  tuned  to 
certain  linear  models  (macro-states)  are  checked  to  assure  that  the  estimate 
is  appropriate  for  the  specific  filter,  Reference  5.  When  a  filter  and  its 
estimate  are  not  consistent,  less  weight  is  put  upon  that  estimate  in  the 
overall  calculation.  Further,  the  filter  uses  aggregation  techniques  to 
reduce  the  number  of  filters  (computational  complexity)  at  each  time  step  to  a 
pre-determined  value. 


The  filtering  scheme  discussed  above  has  only  a  one  step  memory  (with 
respect  to  the  system  macro-state).  It  was  felt  that  superior  results  could 
be  obtained  by  conditioning  the  consistency  and  aggregation  calculations  on  a 
longer  chain  of  macro-states.  That  is  to  say,  the  basic  filter  structure 
bases  each  consistency  decision  on  only  the  current  macro-state,  but  it  is 
possible  to  look  at  a  longer  chain  of  macro-states  for  the  consistency  update. 
This  scheme  would  have  a  memory  of,  say  k,  time  samples.  A  collection  of 
sequences  of  macro-state  trajectories  would  then  exist  for  each  time  step.  In 
turn,  each  consistency,  and  aggregation,  calculation  would  be  conditioned  upon 
the  macro-state  sequence,  rather  than  just  the  current  macro-state.  This  new 
result  allows  a  systematic  technique  for  improving  filter  performance  by 
expanding  the  memory  of  the  filter.  The  development  is  given  in  Reference  6, 
where  the  switched  Markov  model  has  also  been  extended  to  the  observation 
process  as  well. 

Direct  mathematical  analysis  of  the  filter  is  thought  to  be  untractable 
due  to  its  complexity,  necessitating  analysis  through  simulation  techniques. 
Several  detailed  digital  computer  simulations  were  developed  to  simulate  the 
new  filter  structure,  and  to  allow  comparison  with  conventional  Extended 
Kalman  filtering  (EKF)  techniques.  At  the  present  time,  the  simulation 
provides  for  the  analysis  of  scalar  systems  with  an  arbitrary  number  of  macro¬ 
states,  and  with  the  nonlinearity  in  both  the  state  propagation  function,  as 
well  as  the  measurement  function.  As  currently  implemented,  the  model  does 
not  fully  support  the  multiple- level  memory.  The  results  of  exercising  the 
computer  model  indicate  that  the  single  stage  filter  shows  considerable 
advantages  over  the  EKF,  particularly  when  the  nonlinearities  are  not  one-to- 
one. 


Several  questions  exist  concerning  the  advantages  of  the  multi-memory 
filter.  First,  it  is  felt  that  since  it  is  assumed  that  the  system  jumps  from 
macro-state  to  macro-state  with  Markov  jumps,  then  why  should  a  filter  require 
a  multi-step  memory?  One  response  is  that  since  the  true  macro-state  is  never 
known,  adding  the  additional  memory  makes  the  detection  problem  (estimating 
the  macro-state)  more  reliable.  Further,  it  is  believed  that  the  filtering 
scheme  will  be  applicable  to  many  systems  which  are  described  by  general 
nonlinearities  rather  than  Markov  switched  piecewise  linear  models  (i.e.  the 
additional  memory  may  improve  filter  performance  in  the  presence  of  modeling 
errors).  An  approximate  filter  for  smooth  nonlinear  systems  was  derived  as  an 
alternative  to  the  present  scheme  in  Reference  7  and  Appendix  G.  Another 
issue  concerns  evaluation  of  the  tradeoff  between  complexity  and  memory. 
Adding  additional  memory  increases  complexity,  so  it  will  be  necessary  to 
compare  to  increasing  the  number  of  filters  (N)  allowed  at  any  time.  Model 
reduction  techniques  may  be  used  to  simplify  the  systems  in  each  macro-state. 
A  technique  for  evaluating  and  consequently  incorporating  the  loss  of 
information  based  on  balancing  has  been  developed  in  References  8  and  9  and 
Appendices  H  and  I.  A  related  result  on  reduced  order  LQG  design  was  derived 
in  References  10  and  11  (see  Appendices  J  and  K). 
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SECTION  III 


FINITE  WORD  LENGTH  EFFECTS 

The  main  thrust  of  this  part  of  the  research  has  been  in  the  analysis  of 
the  performance  of  filtering  schemes  subject  to  finite  word-length 
implementations  using  floating  point  arithmetics.  The  results  of  the  analysis 
are  then  used  to  derive  improved  design  techniques  for  such  filters.  Finally 
these  results  are  related  to  the  modeling  and  realization  aspects,  with 
applications  to  switched  Markov  systems. 

The  bilinear  model  for  finite  word-length  implementation  of  recursive 
computations  using  floating  point  arithmetic  was  first  developed  in  Reference 
12  (see  Appendix  L).  The  effect  of  the  word-length  on  different 
implementations  of  recursive  filters  is  discussed  in  References  13  and  14  and 
provided  in  Appendix  M  and  N.  The  optimal  derivation  of  filter  gain  to 
compensate  for  the  quantization  error  due  to  the  finite  word- length  is  given 
in  Reference  15  and  Appendix  0.  The  results  show  that  there  is  quite  a  good 
agreement  between  the  model  and  the  actual  error  obtained  via  simulation. 
Furthermore,  the  results  indicate  the  possibility  of  swapping  model  size  for 
word- length  size  in  order  to  arrive  at  an  optimal  choice  for  the  filter 
computational  complexity.  The  resulting  related  topics  involve  a  more  general 
approach  to  realization  theory  and  optimal  design  using  geometric  approaches. 

More  recent  results  involve  the  improvement  of  the  floating-point  error 
model  by  including  impulsive  type  errors  that  may  occur  as  a  result  of 
subtraction  operations  in  filter  implementation.  This  has  been  accomplished 
by  using  an  additive  Poisson  error  model  in  addition  to  the  bilinear 
i multiplicative)  Gaussian  error  model. 

As  for  realization  theory  we  first  distinguish  between  realization  and 
identification.  In  the  first  all  measurements  are  supposed  to  be  error  free. 
In  the  stochastic  context  for  instance,  this  means  that  the  joint  distribution 
of  all  relevant  variables  are  exactly  known.  The  identification  problem,  we 
see  then  as  the  more  realistic  problem,  given  inaccurate  measurements.  The 
realization  problem  is  therefore  a  fundamental  idealization,  and  it  is  natural 
to  stu«W  this  first.  Moreover  any  identification  problem  may  be  envisioned  as 
the  realization  problem  of  a  perturbed  system.  Approximation  problems  relate 
then  to  the  previous  as  the  realization  of  a  system  that  is  deliberately 
perturbed,  so  as  to  trade  off  with  some  complexity  measure.  The  main  question 
we  considered  is  the  following:  Suppose  a  sequence  of  data  vectors  is 
obtained,  how  can  we  know  if  it  is  a  sample  path  from  a  stochastic  system,  or 
a  set  of  input/output  data  for  a  deterministic  system?  The  early  results  of 
this  theory  are  given  in  References  16,  17,  and  18  and  shown  in  Appendix  P,  Q, 
and  R.  Reduction  techniques  for  stochastic  models  are  discussed  in  References 
19  and  20  and  Appendix  S. 

The  work  on  realization  is  still  incomplete  in  many  ways.  More  work  is 
needed  to  strengthen  some  arguments.  The  idea  that  the  data  is  fundamental 
and  which  leads  to  the  measure  theoretic  (the  measures  are  directly  determined 


from  the  data)  approach  will  also  enable  us  to  generalize  to  a  realization 
procedure  for  deterministic  and  stochastic  automata.  In  fact,  it  turns  out 
that  the  discussed  setup  is  conceptually  much  simpler  for  systems  over  a 
discrete  state  space,  which  is  appropriate  for  the  piecewise  linear  Markovian 
approximation  model  for  nonlinear  systems  discussed  in  Section  I. 

A  related  topic  is  the  use  of  a  geometric  approach  to  the  sensitivity 
problem  that  has  the  potential  to  lead  to  an  optimal  selection  of  a  design  of 
a  filter  given  a  finite  word-length  quantizer.  In  particular,  we  investigated 
the  problem  of  approximating  a  system  model  using  elements  or  parameters  from 
a  finite  set.  The  preliminary  results  of  this  effort  is  provided  in 
References  21  and  22  and  Appendices  T  and  U. 


SECTION  IV 


HYBRID  SYSTEMS  MODELS 

Hybrid  systems  are  system  models  that  include  both  discrete  and  continuous 
dynamics.  The  study  of  such  systems  is  directly  related  to  the  switched 
Markov  approximations  of  nonlinear  systems.  The  macro-states  provide  the 
discrete  states  while  the  trajectory  is  represented  by  continuous  states.  We 
also  obtain  a  hybrid  system  model  if  we  consider  a  quantized  implementation  of 
guidance  and  control  algorithms.  The  research  into  hybrid  systems  may  be 
divided  into  three  distinct  parts. 

The  first  considers  the  properties  of  systems  subject  to  quantization  and 
general  hybrid  systems  subject  to  fast  and  slow  dynamics.  Such  multiple  time- 
scale  dynamic  behavior  is  inherent  in  the  switched  Markov  approximation  in 
which  the  system  remains  in  the  contracting  macro-states  for  a  long  time  and 
in  the  expanding  macro-states  only  a  brief  time.  Three  major  results  were 
derived.  The  first  derives  the  limiting  behavior  of  hybrid  systems  when  both 
the  discrete  process  and  the  continuous  process  can  display  fast  and  slow 
dynamics.  The  results  are  given  in  Reference  23  and  Appendix  V.  Next,  the 
control  algorithm  for  a  quantized  linear  system  subject  to  fast  and  slow 
dynamics  is  examined  and  an  approximate  method  is  used  to  implement  the 
control  using  singular  perturbation  theory  Reference  24  (see  Appendix  W). 
Finally,  the  results  in  Reference  24  are  extended  to  a  general  multimodel 
system  that  is  piecewise  linear  in  different  regions  which  is  the  basic 
approximation  of  interest  throughout  this  investigation.  These  are  given  in 
Reference  25  and  Appendix  X. 

The  second  topic  is  involved  in  investigating  the  general  properties  of 
hybrid  systems.  Properties  such  as  observability,  controllability,  stability, 
and  stabilizability  of  such  systems  are  derived  Reference  26  (see  Appendix  Y). 
The  result  are  being  used  to  derive  guidance  and  control  algorithms  for  such 
systems  with  direct  applications  to  the  control  of  the  switched  Markov  systems 
used  in  this  report. 

The  third  topic  involves  the  study  of  incompletely  known  hybrid  systems. 
The  incomplete  knowledge  may  stem  from,  say  fluctuations  of  the  operating 
characteristic.  For  example,  in  radar,  it  may  be  the  precise  form  of  the 
radiation  pattern  of  an  antenna.  Typically,  the  exact  location  of  the  many 
nulls  in  the  pattern  are  unknown,  or  the  exact  pattern  is  too  complex  to 
describe  especially  when  multiple  scatterers  are  used.  We  derived  an 
approximation  method  for  this  problem,  which  is  exact  if  only  one-step 
prediction  is  used.  It  is  possible  to  do  so  for  both  discrete  and  continuous 
time  systems,  as  the  problem  lies  more  with  device  characteristics  than  with 
algorithms. 

An  additional  approach  that  is  also  a  hybrid  in  nature  involves  the 
modeling  of  the  maneuvers  of  the  tracked  vehicle  by  a  dynamic  system  whose 
input  is  both  Poisson  and  Gaussian  noise.  The  approach  assumes  that  the 
Poisson  process  is  dependent  on  the  continuous  state  of  the  system.  Optimal 
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filters  and  an  approximate  implementation  for  such  models  are  derived  in 
Reference  27  and  shown  in  Appendix  Z.  The  approach  is  based  on  simultaneous 
detection  of  the  incident  actions  of  the  Poisson  input  as  well  as  the 
estimation  of  the  state  of  the  system  which  is  the  primary  objective.  The 
combination  of  techniques  derived  in  these  various  approaches  should  yield  a 
comprehensive  procedure  for  tracking,  guidance,  and  control  of  air-to-air 
systems  satisfying  the  hybrid  models  assumptions. 


SECTION  V 


CONCLUSIONS 

The  results  of  this  work  indicate  that  an  implementable  nonlinear  filter 
can  be  developed  for  a  scenario  that  includes  the  air-to-air  guided  missile  as 
a  special  case.  In  particular,  the  approximate  filter  can  be  derived  by  a 
consistent  set  of  approximating  procedures  that  can  be  improved  in  accuracy  as 
needed  at  the  expense  of  complexity.  The  filter  performs  better  than  typical 
filters  using  different  approximation  especially  for  ambiguous  and  sharp 
nonlinearities.  The  implementation  can  be  simplified  if  the  special  two-time 
scales  behavior  of  the  resulting  approximation  is  utilized.  The  filter 
implementation  may  be  improved  if  the  finite  word-length  of  the  quantizers 
used  in  the  computations  are  used  to  optimally  select  the  filter  gains. 
Finally,  the  general  properties  of  hybrid  systems  and  the  general  theory  of 
realization  and  sensitivity  can  be  used  to  design  an  improved  guidance  and 
control  algorithms  for  such  scenarios. 
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Abstract 

This  paper  la  concerned  with  an  approximate 
linaar  swltcbed-per axeter  Markov  model  for  dis¬ 
crete-time  ayataoa  vftoaa  nonllnaar  homogeneoua 
part  la  piecewise  linaar.  A  scalar  aystaa  with 
vbits  Gaussian  noise  Input  la  considered  and  It 
la  shown  that  a  steady-state  spproxiaation  la 
valid  for  two  extreme  casaa.  The  tltst  la  tha 
casa  **>an  all  tba  slopaa  of  tha  piacawisa  linaar 
■odal  ara  stabla,  and  tha  tegiona  ara  largo  rala- 
tlva  to  tha  noiaa  varianca.  Tha  sacond  la  tha 
casa  whan  thsrs  ara  unatabla  ragiona  adjoining 
stabla  ragiona,  an  tha  unatabla  ragiona  ara  small 
ralatlva  to  tha  noiaa  varianca. 

i.  □tranoocnoM 

Thara  has-  baan  a  latga  amount  of  work  devot- 
ad  to  estimation  and  flltarlng  in  switchad* 
anvlronaanta  (a.g.,  (1-7)),  which  tnvolvaa  linaar 
ayataaa  with  unknown  parsaatars  or  aodala  which 
can  taka  diffarant  valuaa  at  avary  obaarvation 
instant  baaad  on  a  Markov  chain  Modal.  Such 
aodala  hava  baan  usad  to  rapraaant  ayataaa  with 
tiaa-varylng  but  unknown  paraaatara  obaarvad  In 
noiaa  with  unknown  but  slowly  changing  covarianca 
aatrU.  Such  achaaaa  ara  applicable  to  nonlinaar 
ayataaa  whan  rapraaantad  as  a  Markov  transition 
among  linaar  aodala  HI ,  Sines  many  nonlinaar 
problems  defy  systematic  analytical  procedures, 
this  paper  la  concerned  with  exploring  tha 
approximation  of  a  nonllnaar  dynamic  system  by  a 
sat  of  linear  aodala  with  Markov  transitions 
among  these  linaar  models.  The  objective  is  to 
derive  the  approximate  modal,  and  inveatlgate  tha 
conditions  under  which  it  is  a  valid  representa¬ 
tion  of  tha  nonlinaar  ayatam.  ror  simplicity  wa 
consider  a  scalar  discrete-time  aystma 
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whsra  xk  Is  the  atst*  st  time  tk,  gtx)  is  s  xsro- 
memory  nonlinearity,  and  (w  }  la  a  white  Gaussian 
noiaa  sequence  with  zero  mean  and  variance  «  . 
Tha  primary  assumption  la  that  g(x)  is  given  as  a 
piacawisa  linaar  function,  or  that  It  can  be 
adequately  represented  by  such  an  approx  last  ion i 
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tet  the  a  aerostats  define  the  region  a  <  x 
<  ®tei  *  then  from  (1)  and  (2)  we  aay  derive  *txan- 
altlon  probabilities 
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The  approximation  conaldacad  in  this  paper 
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which  rtmoves  the  dependence  of  tha  transition 
pcobabllitiea  on  the  actual  value  of  x^,  and  also 
aaaumes  that  in  eacroatata  3 ^  tha  system  aatis- 
flea  tha  linaar  aquation 
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rurtharmora,  it  la  assumed  that  tha  transition 
from  state  to  state  follows  a  Markov  chain  rula 
with  transition  aatrix  {fl  ). 

Tha  objective  of  this  paper  is  to  analyza 
tbe  validity  of  the  approximation  depending  on 
the  assumptions  on  tha  paraaatara  (o  ,  a,, 
b^,  0).  In  this  section  the  notations  of  tha 
transition  and  othar  dsnaity  functions  of  tha 
•ystM  atata  ara  daflnd  and  darlvsd.  Than  tha 
analysla  la  carried  out  for  tha  special  case  of 
an  odd  symmetry  In  g(x)  with  N  •  3  to  iiaplify 
tha  analysla.  Tha  special  casaa  of  stabla  sys¬ 
tems  (4)  is  first  considared,  and  than  tha 
unatabla  case  falling  between  two  stable  regions 
ia  Investigated.  Tha  resulting  axprasatona, 
wftlla  epee  tallied  to  M  •  3,  aay  be  generalized 
without  major  difficulty  to  tha  arbitrary  case. 

Lot  fk(x)  danots  the  probability  density 
function  of  x.  ,  than  we  define  two  new  quanti¬ 
ties! 
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Tha  transition  probabilitlaa  and  tha  recursive 
relation  of  the  density  functions  ara  given  by 
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Aseumlng  t  steady  state  distribution  and  transi¬ 
tion  aatrix  exist,  then  they  aust  satlsfyi 


f  (x) 


N 

I 

i-1 


Pt 


191 


15 


a 


-r 


/  f(x)dx 


(10) 


13 


’  r 

r  J  pi 


(x)dx 


(11) 


l  a 


3 


Ths  transition  matrix  [ H  }  in  turn  leads  to  a 
ataady  atata  probability,  i  ,  of  the  systss  being 
In  aeata  3, .  The  remaining  parts  of  ths  papar 
ara  devoted  to  exploring  tha  conditions  undat 
which  for  tha  moodel  defined  by  (3)  and  (4)  ona 
obtains  3  •  p,  and  tha  p  p.(x)  ara  spproxlmate- 

ly  equal  to  tha  danalty  obtained  for  tha  ayataa 
(4)  undat  atata  3,.  Tha  derivation  la  perforsed 
for  an  odd-symnstr Ic  g(x)  with  N«3  as  dlacuaaad 
abova. 


n.  THX  STUB!*  CXSX 


For  tha  atabla  casa  we  consider  for  convan- 
lanca  g(x)  of  the  fora 


*  IS  *  6sgnx  *.  !x!  >‘  l  <’2> 


where  0  ■  a(a-b),  and  aasuaa  that  lal  <  1  and  Ibl 
<  1 ,  ao  that  the  ayataa  la  atabla  in  each 

region.  Tha  ayataa  has  three  atataa  denoted 
Soi  Ixl  <  a,  8+i  x  >  -a  and  S.i  x  <  -a.  Due 
to  symmetry,  the  steady-etata  densities  and  tran¬ 
sitions  ara  given  byi 


f(x)  •  p  (x)  ♦  p  (xj  ♦  p  (— x) 
o  ♦  ♦ 


{ml 
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It  is  obvious  tcm  these  expresalons  that 


f  p  (x)dx  •  I  ftxldx  ,  1  p+(x)dx  •  /  £(x|dx 
°  -a  -<*  a 

(15) 


If  we  now  assume  that  a/c  >>  1  then  it  appears 
that  tha  transition  probabilities  to  other  states 
ara  relatively  small  and  hence  an  approximate 
solution  for  tha  KarXov  linear  modal  becoaes  one 
of  weighted  sa  of  the  steady-state  densities  in 
each  region  aultlpllad  by  tha  correspond ing  prob¬ 
ability  of  being  in  that  aacrostate.  Tha  steady 
state  for  tha  probability  dansitiaa  solution  of 
(4)  for  tha  nonlinearity  (12)  in  tha  three 
regions  SQ  and  3^  is  denoted  by  ^(x)  and  q^tx) 
respectively,  and  is  given  by 
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If  we  denote  tha  steady-state  probabilities  of 


being  in  aacrostate  S0,  3±  by  I  and  3  respec¬ 
tively,  we  arrive  at  the  approximation  * 


P„(x)  •  a0<J0(*)  *  P+<*>  *  “♦‘J*1*1 


(19) 


where 
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It  reaains  to  ahow  that  this  heuristic  approxima¬ 
tion  is  Indeed  valid  as  first-order  approximation 
to  ths  functions  as  defined  by  the  integral  oper¬ 
ators  (13).  The  substitution  of  g^lx)  and  q^tx) 
in  the  Integral  equations  (13)  yields  the  follow¬ 
ing  error  Caras  in  p  (x)  and  p+(x)  respectively) 


3  _  .  <«-«y)2  _  .  ir«ii.2 

e  20  q  (y)  ♦  /  a  20  q  (y)dyj 

/2s  do  a 

a  2 

*  *  /  e  0  (q^ty)  ♦  q^l-ylldy  (21) 

/Ts  0  -o 


(s-by-  6jJ 
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(a0Vy)  ♦  n^q^(y)  )  dy 
(x-by-6)  2 


«♦  I 


2o 


q^(y)dy)  (22) 


These  terse  can  be  expressed  explicitly  by  using 
the  Gaussian  density  and  distribution  func¬ 
tions.  It  can  be  shown  after  lengthy  manipula¬ 
tions  that  these  terms  can  becose  negligible 
if  a/s  »  1  provided  that  the  other  parameters 
are  sufficiently  bounded  se  follows) 


1st  <  K,  <  1  ,  Ibl  <  <  1  ,  Ibl -I si  >  Kj  >  0 
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The  derivation  for  this  case  doe*  not  yield  en 
expansion  for  th*  appropriate  densities  so  that 
correction  tern*  aay  be  used.  The  generalization 
to  N  )  3  and  to  nonsymaetr  lo  nonlinear  ities 

appears  straightforward  even  though  tedious  and 
the  condition  in  this  case  are  related  to  a  rela¬ 
tively  large  linear  regions,  and  sufficiently 

different  slopes  which  are  bounded  away  from 
unity. 

ni,  thb  amsTx&uc  a is* 

This  section  is  concsrned  with  cases  when 
the  nonlinearity  g(x)  has  slopes  greatar  than 
unity,  and  thus  lsadlng  to  unstable  aodels  In 
(4).  Since  in  this  case  the  system  leaves  tha 

regions  with  probability  one,  in  order  to  obtain 
a  meaningful  model ,  w*  have  to  aseua*  that 

unstable  regions  are  bordered  by  stable  one*, 
furthermore,  it  is  assumed  that  the  external 
region*  are  stable  *o  that  the  overall  ayatea 
will  not  diverge.  Again,  for  convenience  we 
consider  the  case  H*3  given  in  (12)  with  the 
added  assuaptlon  that  Ibl  <  1,  while  I  a  I  >  1.  If 
the  notations  for  the  densities  and  transition 
probabilities  of  Section  II  are  used,  then  for 
this  case  (13)  and  (14)  continue  to  be  valid. 

However,  while  we  aay  consider  a  ateady  state  for 
f (x)  we  cannot  postulate  the  existence  of  a 
steady  stats  for  q^x)  which  represents  th* 
steady  state  density  for  tha  system  in  region 
SQ.  In  order  to  obtain  reasonably  valid  epproxi- 
aations,  it  is  asauaed  that  a/a  ■.<  1  so  that  tha 
transition  probability  H  <<  1 .  In  this  case 
asymptotic  expansions  of° th*  stsady-state  solu¬ 
tion  for  f(x)  are  possible,  and  thus  pQ(x)  and 
P+(x)  can  be  identified  and  compared  to  th* 
linear  Markov  aodels.  However,  due  to  th*  six* 
of  th*  unstable  region,  th*  two  stable  regions 
behav*  in  th*  limit  as  a  single  region.  In  order 
to  avoid  this  rather  sundan*  case,  the  additional 
assuaptlon  that  |8|/o  >>  1  is  sad*.  It  can  be 
shown  that  these  conflicting  requirement*  can  be 
set  if  the  slope  of  tha  unstable  region  is  large. 

Me  start  by  obtaining  a  series  expansion  for 
th*  solution  of  (12)  and  (13).  Let  Pj  (x)  ■  p+(x) 
♦  p+l-x)  then  we  have 

Po(x)  »  0(a)  lpo  ♦  pj)  (23) 

P,  (x)  •  H(a)  (po  ♦  pj]  (24) 

where  G(a)  and  H(o)  are  integral  operator* 
defined  by 
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1,,  2o  2o  , 
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•  /  B (uix.y) ♦ (y) dy  (26) 
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Since  th*  Lm  norm  of  G(x)  is  bounded  by  (u/o)  , 
then  it  is  possible  to  expand  it  in  a  sec  iss  with 
respect  to  th*  parameter  a/o.  Using  th*  assump- 
tions  a/o  <<  1  and  a/b  •  a/o  >>  1  we  nay  use  as  a 
starting  point  th*  steady  state  solutions  of  (4) 
in  regions  as  an  approximation  to  p,  (x)  , 

namely 

p1  (x)  •  2+(q+(x)  +  q^l-xll  (27) 

In  view  of  th*  fact  that  l p  ( x) I  <  a/a  /2/»  K , 
where  K  is  finite,  (th*  value  of  K  will  be  veri¬ 
fied  in  the  sequel),  th*  error  terms  of  the 
approximation*  (27)  when  substituted  in  (24)  can 
be  derived  In  th*  same  manner  as  for  the  stable 
case  of  Section  II.  It  reaalns  to  derive  the 
approximation  for  pQ(x)  by  substituting  (27)  in 
(23).  The  first-order  term  for  pQ(x)  can  be 
obtained  by  integration  as: 
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where  ♦  is  ths  unit  Gaussian  distribution, 
function  »(x)  achieve*  its  maximum  at 


-a(l  -  -S)  (1+b) 


and  tile  maxima*  value  is  given  by 


QO  a Q 


A,  <  1 


However,  th*  maximum  of  4>(x)  occurs  inside  (-a, a) 
while  of  the  multiplying  exponential  is 
in  (o,*).  Consequently,  ths  following  bounds  for 
P0(x)  in  th*  two  intervals  may  b*  obtained:  for 
Ixl  (  g,  m  have 


1 P  (x) I  < 


and  for  Ixl  >  a 
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These  bound*  validates  th*  original  assumptions 
sad*  about  pQ(x).  Furthermore,  it  can  be  easily 
shown  that  th*  steady  stat*  probability  of  macro- 


■tat*  SQ  is  given  by 


/  p  (x)dx  •  /  tqv(x)  ♦  qv(-x)ldx 
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(33) 


which  confirm*  th*  assumption*  that  2  «  1  and 

th*  six*  of  th*  bound  on  P0(x) •  Additional  mani¬ 
pulations  of  (28)  allow  th*  *xpr*ssion  for  p0<z) 
to  b«  approximated  by 
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Th*  density  obtained  in  (34)  is  equivalent  to 
that  derived  for  th*  system  stat*  xk>1  in  on* 

step 


‘k*1 


♦  \  »  *„ 


c  3. 


(36) 

that 


unifying  that  th*  transition  <<  1 
higher  order  terms  of  th*  approximation*  of  P0(x) 
are  negligible.  The  bound*  on  the  transition 
probabilities  are  obtained  from  their  approximate 
expressions  by  using  (14),  (28),  (34)  and  (27). 


w*  now  consider  th*  degenerate  case  of  c  << 
1  and  c  a/b  <  1 .  It  is  convenient  to  reformulate 
th*  problem  a*  a  perturbation  problem.  Upon 
defining  the  perturbation  (or  scattering)  oper¬ 
ator 


H<a)  -  H<a)  -  H(o) 


(37) 


and  expanding  in  a  Taylor  series  about  a»0,  on* 

gets 
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<F(y)>  denotes  th*  everag*  of  F  over  (o,a|  l.e. 
<7(y)>  •  -  /  f(y)dy 


(40) 


Th*  first  order  scattering  operator  Is 
*-b 
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♦  (y)  dy 
(41) 


Por  what  follows,  It  wlll.be  convenient  to  write 
th*  unperturbed  operator  H(o)  and  th*  scattering 
operator  L  In  a  symmetric  term.  Indeed  by  sub¬ 
stituting  -y  for  y  in  H(oix,y)  and  L(x,y)  and 
adding  **  get 


8 ♦  *  /  B  (oi x,y)  Bv  [♦  (y)  dy 


(42) 


L ♦  •  J  L(x,y)  Oddi  ♦  (y) ) dy 


(43) 


where  Ev(4!  and  0dd(4!  are  respectively  the  even 
and  odd  parts  of  £(•).  Th*  norm  of  th*  per¬ 


turbing  operator  !•  bounded  by 


£  2  ±.  c± 


sob 


Th*  assumptions  sad*  at  th*  beginning  of  the 
paragraph  validat*  therefor*  a  perturbation 
expansion.  Letting 
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P,  (X)  ■  P,  (X)  ♦  CP,  (x)  ♦  c  p 
and  substituting  In 

P„  •  «2(P„  ♦  P,) 


(x) 


o  -'o  'V 

p,  •  CH  (O)  ♦  «)  (Po  4  Pl  ) 


(44) 

(45) 


wh*r£  we  wrote  explicitly  cG  for  G  and  e a 
for  u*  g«t»  up  to  1st  order 
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The  solution  of  (46b)  Is  readily  found  to  be 
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Mot*  that  this  Is  th*  normaiixed  solution. 
Clearly,  when  th*  perturbation  terns  are  to  be 
taken  Into  account,  th*  overall  solution  needs  to 
be  renormalixed,  l.e. 
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f(x)  •  po ( x)  ♦  p,  (x)  •  P1  (X)  ♦ 
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•  (l  ♦  «  ♦  C(l-H)”1  (B  ♦  BG)  jp}0’  (x) 

■ust  ^tegrat*,,^  on*.  Th*  successive  solutions 
for  p^'  and  p1  '  xx*  then  computed  *»  «  s*c  i«* 
in  the  orthonormal  Bermit*  (unction* 
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th*  equation* 
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yield  respectively 
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Tb*  8  xpp«arin<j  in  tb*  co*£fici*ntx  tot  »'2'  1* 

contained  in  th*  interval  (o,a)  and  xxiaaa  fro* 
tb*  (act  that  B  contain*  an  'average*  of  Bf  *>lch 
by  Roll*1*  theorem  equal*  tb*  valu*  of  Bf  fox 
mm  point  in  that  interval  (o,a).  ruxthax  th* 
expansion 

B(oi*,8)  -  l  2(-b)n  ♦  (|i)  ♦„(£-)  (51) 

n«0  1  1 

i*  ua*d.  Tb*  u»*  of  th***  expansion*  allow*  than 
tb*  computation  of  th*  steady  atat*  pcobabiliti** 
and  th*  transition  probabilities  up  to  firat 
order  for  th*  aacrostates. 

IT.  SCM4ARX  UD  COKUBXOMB 

Thia  pap«r  shoved  that  und*r  certain  re¬ 
striction*  a  piecewise  nonlinear  discr«t*-tiaa 
dynamic  ay  a  tea  aay  b*  approximated  by  Katkov 


transition*  among  a*v*ral  linear  models,  a*  far 
a*  th*  at**dy-atat*  distribution  is  concerned. 
It  remains  to  generalize  th*  approach  to  higher 
order  systems  a*  well  a*  auitipl*  nonlinear i- 
ti*».  Th*  effect  of  th*  approximation  on  estima¬ 
tion  and  filtering  scheme*  need*  to  be  further 
explored.  finally,  th*  continuous  time  version 
require*  a  different  approach  because  of  the  high 
density  of  th*  level  croaamg*  at  th*  boundary. 
On*  approach  is  to  model  th*  noise  not  as  whit* 
noise,  but  as  having  a  finite  nonzero  correlation 
length.  Th*  additional  aspects  of  th* 

Stratonovlch  integral  deflnitlona  are  expected  to 
be  relevant*. 
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ABSTRACT 


This  paper  is  concerned  with  the  properties  of  piecewise 
linear  discrete-time  dynamic  systems  driven  by  white  Gaussian 
noise.  The  properties  of  the  deterministic  system  are  explored, 
and  condition  for  the  existence  of  invariant  distributions  are 
derived.  The  existence  of  an  invariant  distribution  was  then  used 
to  justify  the  approximation  of  the  stochastic  system  by  a 
switched  Markov  linear  model  if  the  piecewise  linear  regions  are 
large  "contracting"  ones  or  small  "expanding"  ones  relative  to  the 
input  noise  variance.  The  approach  is  expected  to  be  useful  for 
constructing  approximate  nonlinear  filtering  schemes  for  such 
systems. 


1 .  INTRODUCTION 

Piecewise  linear  systems  can  represent  an  approximation  to 
general  nonlinear  systems.  Usually,  nonlinear  filtering  schemes 
for  such  systems  are  not  exactly  implementable.  As  a  step  leading 
to  the  development  of  nonlinear  filtering  schemes  a  systematic 
analysis  of  such  systems  is  needed.  Since  it  is  not  possible  to 
obtain  exact  implementations,  one  must  resort  to  approximations. 
Such  approximations  can  be  made  at  the  filtering  stage,  or  they 
can  be  made  at  the  modeling  stage.  Consequently  we  are  concerned 
in  this  paper  in  the  analysis  problem  of  piecewise  linear  systems 
driven  by  white  Gaussian  noise.  However,  as  the  analysis  is 


already  complex  in  the  deterministic  case,  we  consider  the 
properties  of  the  deterministic  case  first.  The  model  we  are 
interested  in  is  the  switched  Markov  linear  model  for  which  there 
already  exist  several  approximate  approaches  for  the  solution  of 
the  filtering  problem.  This  model  assumes  that  underlying  the 
system  there  exists  a  finite  state  Markov  model,  such  that  under 
each  state  the  system  satisfies  one  linear  dynamic  model.  The 
paper  is  concerned  with  the  assumptions  that  are  needed  for  the 
approximation  to  be  a  valid  representation  for  the  original 
nonlinear  system. 

In  the  study  of  the  dynamics  of  discrete  nonlinear  systems,  it 
is  known  that  in  some  cases,  the  sensitivity  of  trajectories 
(sequences)  with  respect  to  initial  condition  may  lead  to  some 
wildly  unpredictable  dynamic  behavior,  even  though  the  system  is 
completely  deterministic.  It  appears  therefore  that  a  statistical 
approach  may  be  more  fruitful  than  a  purely  deterministic  one.  In 
fact,  what  seems  to  be  a  very  significant  problem  in  this  case  is 
the  study  of  the  flow  of  densities.  In  particular,  a  very 
important  question  is  the  existence  and  uniqueness  of  invariant 
measures  which  will  be  considered  in  the  first  part  of  this  paper. 
Its  importance  stems  from  the  fact  that  existence  and  uniqueness 
of  invariant  measures  imply  the  ergodicity  of  the  dynamical  map. 
The  second  part  will  build  upon  the  properties  of  the 
deterministic  distributions  to  obtain  the  characteristics  of  the 
system  for  the  stochastic  case  [1,2]. 

2.  DETERMINISTIC  NONLINEAR  SYSTEMS 

This  section  introduces  and  defines  the  terms  to  be  used  in 
the  later  sections  of  the  paper,  and  describes  the  properties  of 
piecewise  linear  dynamic  discrete-time  systems.  In  general  we  are 
concerned  with  scalar  systems  of  the  form 

xk+l  "  f<xk> 

where  x^  is  the  state  of  the  system  at  time  t^,  and  f(x)  is  a 
nonlinear  function.  The  properties  of  such  a  system  are  discussed 
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in  general,  with  particular  attention  to  the  piecewise  linear 
case. 


2.1.  General  Maps 

Let  f  be  a  map  from  Kn  to  Rn,  associated  with  a  dynamical 
system 

x^+i  *  f(xjj)  where  x  is  in  8° 

Definitions: 

The  map  f^:  Rn  -->  Rn  is  called  the  n-th  iterated  map  of  x. 

The  orbit  of  x  is  the  sequence  x,  f(x),  f2(x),  ... 

An  equilibrium  point  is  a  point  x  in  Rn  for  which  f(x)  *  x. 
Equilibrium  points  are  also  called  fixed  points. 

A  point  x  is  a  periodic  point  for  the  map  f  of  period  p  if  it 
is  a  fixed  point  for  fP.  The  least  positive  n  such  that  fn(x)  *  x 
is  called  the  prime  period  of  x. 

If  x  is  periodic  with  prime  period  p,  then  the  part  x,  f(x), 
f2(x),  ....  fn_1(x)  of  the  orbit  of  x  will  be  called  the  cycle  of 
x,  and  will  be  denoted  by  <x>. 

A  set  S  is  an  invariant  set  for  f  if  both 
f(S)  c  S 
f_1(S)  c  S 

Alternative  definitions  can  be  given  for  invariant  sets  by  virtue 
of  the  following  equivalence: 

Theorem  1:  The  following  are  equivalent: 

1)  The  set  S  is  invariant  with  respect  to  the  map  f 

2)  f_1(S)  -  S 

3)  For  all  integer  k,  the  set  f^(S)  is  included  in  S. 

Proof:  We  first  show  the  equivalence  of  1)  and  2).  By  definition, 
we  already  have  f”*(S)  c  s.  So  we  only  need  to  show  that  S  c 
f“*(S).  But  this  is  obvious,  since  for  all  x  in  S,  f(x)  is  in  S  by 
the  first  part  of  the  definition.  But  then  x  is  also  in  the 
inverse  image  f*(S). 

Conversely,  if  f"^(S)  ■  S,  then  f~Hs)  c  s  is  obvious,  while  for 
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all  x  in  S  =  f"l(S),  clearly  f(x)  is  in  S.  Hence  f(S)  c  S. 

Next,  letting  k  equal  1  or  -1  in  3)  implies  the  definition. 
Iteration  on  the  conditions  of  the  definition  implies  3).* 

Finally,  if  f  is  invertible,  then  invariance  of  a  set  S  under 
the  map  f  is  equivalent  to  f(S)  ■  S. 

Related  to  the  notion  of  an  invariant  set  is: 

A  closed,  simply  connected  region  Q  in  Rn  is  a  trapping  region 
in  f  if  f(Q)  is  contained  in  the  interior  of  Q.  It  is  sufficient 
that  the  vector  field  f(x)  is  directed  everywhere  inward  on  the 
boundary  of  Q.  It  follows  that  for  a  trapping  region  f^(Q)  is 
contained  in  Q  for  every  positive  integer  k.  In  fact  it  is 
readily  shown  that  the  sets  f^(Q)  are  closed  and  nested. 

2.2.  General  One-Dimensional  Maps 

In  the  one  dimensional  case,  assume  that  f  is  differentiable 
almost  everywhere,  (i.e.  f  is  not  differentiable  in  at  most  a 
countably  infinite  number  of  points.  Then  the  following 

definitions  are  standard: 

An  equilibrium  point  (fixed  point)  of  a  one-dimensional 

differentiable  map  is  called  stable  if  |f'(x)|  <  1,  and  unstable 

if  |f'(x)|  >  1. 

Theorem  2:  If  inf  |f'(x)|  >  1  in  some  invariant  set  S,  then  inf 

|(fn)'(x)|  >  1  in  S.  If  sup  |f'(x)|  <1  in  some  invariant  set  S, 

then  sup  |(fn)'(x)|  <  1  in  S. 

Proof:  By  induction.  Assume  igf  j(fn)'(x)j  >  1,  then 
igf  |(f(n+l)).(x)|  -  igf  |f ' (fn(x))  (fn)-(x)| 

2  igf  |f'(fn(x))|  igf  |(fn)'(x)| 

>  1 

The  second  part  is  completely  analogous.* 

Even  for  continuous  maps  f,  the  invariant  sets  can  have  a 
strange  topology.  A  classic  example  is  the  logistics  map  f(x)  ** 
yx(l-x)  for  u  >  4.  The  invariant  set  for  this  map  has  the 
structure  of  the  Cantor  set  [3].  A  Cantor  set  is  a  closed,  totally 
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disconnected,  and  perfect  set.  A  set  is  totally  disconnected  if  it 
contains  no  intervals,  and  it  is  perfect  if  every  point  in  it  is 
the  accumulation  point  of  other  points  in  the  set.  A  Cantor  set 
has  a  Lebesgue  measure  of  zero,  even  though  it  contains  an 
uncountably  infinite  number  of  points.  On  the  other  hand,  if  an 
interval  is  an  invariant  set  for  f,  the  following  can  be  asserted: 
Theorem  3:  Closed  invariant  intervals  of  piecewise  continuous 
maps  contain  at  least  one  fixed  point. 

Proof:  Let  J  *  [a,b]  be  the  invariant  interval.  If  there  would  not 
be  a  fixed  point,  the  graph  of  f(x)  is  either  above  or  under  the 
graph  (x,x)  in  J.  Say  f(x)  >  x  in  J,  then  clearly  f(b)  is  not  in 
J,  contradicting  the  invariance  of  J.* 

Remark :  Open  sets  may  be  invariant  and  contain  no  fixed  points  as 
long  as  the  limit  points  are  fixed  points. 

An  interesting  result  on  the  existence  of  periodic  points  is 
the  following,  due  to  Sarkowskii: 

Theorem  4:  (Sarkowskii)  Suppose  f:  R  -->  R  is  continuous.  Suppose 
f  has  a  periodic  point  of  prime  period  k.  If  k  *  1  ,  where  «  is 
the  Sarkowskii  ordering,  then  f  also  has  a  periodic  point  of 
period  1. 

The  Sarkowskii  ordering  is  an  ordering  of  the  natural  numbers: 

3  *  5  *  7  «...  «  2.3  «  2.5  *  ...  *  2^.3  «  2^.5  *  ...  *  2^.5  * 
2^.5  « . «  2^  «  2^  *  2  «  1 

A  proof  of  this  theorem  can  be  found  for  instance  in  [3]. 


Markov  Partitions  and  Markov  Maps 


A  special  class  of  one  dimensional  dynamical  systems  are  the 


piecewise  differentiable  maps  with  the  following  restrictions: 

1)  There  exists  a  partition  P  of  the  real  line  in  a  finite  or 
countable  set  of  disjoint  open  intervals  {I^},  such  that  the  map  f 
restricted  to  the  interval  1^  is  differentiable.  Denote  by  Q  the 
set  of  closure  points  of  the  intervals  {1^}.  Clearly,  Q  is  a 
countable  disjoint  set. 


V 


0  -  8  \  U  Ik  (2) 

k 

2)  In  each  interval  Ik  the  infimum  and  supremum  of  f(x)  belong 
to  the  set  Q. 

The  above  condition  may  look  like  a  very  severe  restriction  on 
the  map  f,  however  it  seems  to  be  crucial  in  order  to  "do  the 
theory".  It  is  noteworthy  that  any  piecewise  continuous  map  can  be 
suitably  approximated  by  such  a  map,  by  taking  finer  and  finer 
partitions  of  R.  In  particular,  in  a  later  chapter  we  shall  study 
such  an  approximation.  Any  map  satisfying  the  above  two 
restrictions  will  be  called  a  Markov  map,  and  the  corresponding 
partition  P  a  Markov  Partition.  We  caution  that  these  definition 
are  not  standard.  Their  motivation  stems  from  the  following 
property: 

Theorem  5:  If  f  is  Markov  with  respect  to  the  partition  P  *  {Ik}, 
then  for  each  interval  Ik,  the  restriction  of  f  to  Ik  may  be 
extended  to  fk  with  domain  Ik,  the  closure  of  Ik,  such  that 

?k(x)  ■  f(x),  x  c  Ik 

?k(x)  ■  lim  f(yn)  where  {yn}  e  Ik  and  yn  — >  x  e  3Ik,  the 
boundary  of  Ik 

then  ?k(Ik)  is  the  closure  of  a  countable  union  of  adjacent 
intervals  from  P. 

Proof:  Denote  by  ak  and  bk  respectively  the  infimum  and  supremum 
of  f  in  the  interval  Ik.  Since  f  is  Markov  with  respect  to  P,  ak 
and  bk  are  in  Q.  Hence  there  exist  countable  adjacent  open 
intervals  Ikj  such  that  the  set 

[  ak,  bk  ]  \  U  Ik1  (3) 

k  J 

only  contains  isolated  elements  from  Q  (in  fact:  Q  D  (ak,bk)).  By 
construction  fk  is  a  continuous  function  on  Ik.  The  intermediate 
value  theorem  guarantees  that  for  any  y  between  ak  and  bk,  and  in 
particular  for  the  points  in  (3)  there  exists  an 
x  c  (?k "*(ak),  ?k~*0>k))  in  Ik  such  that  in  Ik  f(x)  ■  y.  Hence 
^k^k^  "  tak*  bk]»* 
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Theorem  5  indicates  that  some  of  the  interesting  dynamical 
properties  of  the  original  system  can  be  studied  by  "projecting" 
it  onto  the  set  of  intervals  P.  Later  on  we  shall  make  this  notion 
more  explicit.  It  only  serves  to  say  that  the  evolution  of  the 
system  as  seen  in  P  behaves  like  a  Markov  chain.  For  this  reason, 
we  shall  from  now  on  refer  to  the  partition  P  as  the  MACROSCOPIC 
STATE  SPACE  and  the  intervals  of  P  as  the  MACROSTATES. 

Let  N  be  the  index  set  for  the  intervals  in  the  partitioning 
P,  N  is  either  the  set  of  natural  numbers,  or  the  set  of  integers 
in  case  the  partition  is  countably  infinite  ,or  it  will  be  set 
{1,2,3...N}  if  the  cardinality  of  N  is  N.  Clearly  N  is  a 
representation  of  the  macro  state  space  introduced  earlier. 

The  iterated  maps  fn  of  Markov  maps  are  continuous  on  each 
open  interval  of  the  form 

lid)  n  rl«l(2))  n  f-2ai(3))  n  ...  n  f-"+i(ii(n)) 

where  the  i(j)  belong  to  N. 

The  canonical  macroprojection  is  the  map  n  :  R  -->  N  ,  defined 
by  tr(x)  ■  k  <=«■>  x  is  in  1^,  i.e.  u(x)  identifies  the  macrostate 
to  which  x  belongs. 

The  itinerary  of  x  is  the  sequence 
u(x)  ,n(f(x)),  n(f2(x)), . . . 

Associated  with  the  macroscopic  state  space,  we  can  introduce 
the  notion  of  a  macrostate  transition  matrix  n  with  ij -elements 

ntj  -  m  (  f*l(I1)  H  Ij  )  /  m  (  Ij  )  (4) 

where  m  is  some  (not  necessarily  finite)  measure.  It  is  that 
fraction  of  Ij  which  is  mapped  into  1^.  As  long  as  m  is  such  that 
each  macrostate  has  a  nonzero  measure,  the  macroscopic  state  space 
can  be  decomposed  into  transient  and  recurrent  macrostates. 

Finally,  we  shall  on  occasion  also  need  another  transition 
matrix  II  with  ij-  elements 

„  .  f  l,  if  f(l4)  n  lt  *  *  (5) 

-ij  1  0,  otherwise 


i.e.,  11,,  indicates  whether  or  not  it  is  possible  to  go  from  I*  to 


*  VVVV  'r«T'r»'r*  ^  *vj 


The  macrostate  Ie  (Ic)  is  expanding  (contracting)  for  the 
Markov  map  with  respect  to  the  Markov  partition  P,  if  and  only  if 
for  an  expanding  interval  Ie  and  a  contracting  interval  Ic  we  have 
for  the  usual  Borel  measure 


m(  f(I-)  )  £  m(  I„) 


m(  f(l„)  )  S  m(  Ir) 


In  fact,  the  above  formulas  can  be  used  to  define 
expansiveness  and  contractiveness  of  f  with  respect  to  a  more 
general  (nonuniform)  Lebesgue  measure,  but  we  shall  not  pursue 
this  yet. 


2.4.  Affine  Maps 


In  this  section,  the  map  fs  R  -->  R  is  restricted  to  be 
piecewise  affine  (also  called  piecewise  linear).  This  means  that 
there  exists  a  partition  P  of  the  real  line  in  a  finite  or 
countable  set  of  disjoint  open  intervals  {Ik},  such  that  the  map  f 
restricted  to  the  interval  Ik  is  affine,  i.e.  there  exists 
constants  ak  and  Pk  such  that  for  each  x  in  Ik 


f(x)  =»  ak  x  +  (3k 


The  set  Q  contains  all  the  points  where  f  is  not  differentiable 
("knee  points"  and  points  of  discontinuity).  The  map  f  is  then 
completely  specified  by  D  and  the  set  of  indexed  pairs 


U°k,  (Ifc)  ikcN) 


corresponding  to  the  intervals  {Ik}  of  P. 

From  Q  derive  the  family  S  of  closure  points  of  the  intervals 
Ik.  i.e.  if  Ik  *  (ak*bk)  is  an  interval  in  the  partition  P,  then  S 
contains  the  points  ak  and  bk.  Strictly  speaking,  we  have 


ak+l  =  “k  f°r  ®ii  ^k  in  Q 


but  using  this  double  notation  will  facilitate  the  development 
below.  It  is  also  customary  to  denote  them  by  ^4-  and  u^-  .  Noting 
that  the  infimum  and  supremum  of  f(.)  in  each  interval  1^  occurs 
at  points  of  S,  it  can  be  expected  that  this  set  will  play  a  major 
role.  In  particular,  the  piecewise  affine  map  f(.)  will  be  a 
Markov  map  in  the  sense  defined  above,  if  for  every  x  in  S,  f(x) 
is  also  in  S.  It  follows  then  that  if  fd^)  intersects  I j ,  then  Ij 
lies  entirely  inside  fd^).  Furthermore,  the  iterated  maps  fn  are 
also  piecewise  affine,  and  affine  on  each  open  interval  of  the 
form 

i1(0)  n  rkii(1))  n  r2(ii(2)>  n  ...  n  do) 

Because  of  the  definition  of  S  we  also  have  that  if  fn(w)  =  cu, 
then  the  derivative  of  the  iterated  map  ( f n) '  ( a>)  £  0. 

In  reference  to  the  (piecewise  affine)  Markov  maps  defined 
above,  it  is  clear  that  the  condition  i)  implies  that  piecewise 
affine  Markov  maps  can  only  have  unstable  equilibria.  However,  if 
a  piecewise  affine  map  has  contracting  intervals,  then  it  does  not 
necessarily  follow  that  this  map  is  non  Markovian.  (Of  course,  as 
long  as  the  contracting  part  does  not  intersect  the  graph  (x,x)). 

By  virtue  of  the  specific  applications  we  have  in  mind,  we 
shall  further  assume  that  the  piecewise  affine  maps  are 
continuous  (i.e.  the  values  f(w)  are  well  defined  for  each  w  in 
Q) .  This  restriction  is  however  not  required  for  any  mathematical 
reasons.  Moreover,  it  turns  out  that  for  dealing  with  higher 
dimensional  cases,  we  will  have  to  do  away  with  this  assumption. 

Continuous  piecewise  affine  maps  can  be  entirely  characterized 
by  f(0),  the  set  Q  and  the  set  of  slopes  in  each  of  the 
intervals  of  the  partition  P.  For  ease  of  bookkeeping,  augment  the 
set  Q  with  0  and  let  the  intervals  of  P  to  the  right  of  the  origin 
be  indexed  by  positive  integers,  and  the  ones  to  the  left  by 
negative  integers,  i.e.,  for  positive  k 


(11) 


Ik  *  <ak-l*  bk) 

I_k  *  <a-k’  b-(k-l)) 

Letting  ajj  be  the  slope  of  f(.)  in  the  interval  1^,  then  the  map 
f(.)  is  evaluated  by 

f(x)  »  f (0)  +  (bj-ag)  +  a2  (b2-ai)  +  ...  +  ai  (x-aj_i)  (12) 
for  positive  x  in  the  interval  1^,  and  by  a  similar  formula  for 
negative  x. 

For  each  x,  define  the  function  A^x)  as  the  "cumulative  slope 
product  in  a  iterations"  i.e. 

Vx)  -  n  K(fk(x))|  (13) 

k-1 

If  x  is  a  periodic  point  of  prime  period  r,  then  A(<x>)  is  a 
shorthand  notation  for  Ar(x).  <x>  denotes  then  one  cycle  of  the 
itinerary  through  x.  If  the  macro  state  space  is  infinite,  it  will 
be  assumed  that  all  cycles  are  finite. 

A  consequence  is  (the  proof  is  direct  and  omitted): 

Lemma:  If  x  is  a  periodic  point  of  f,  then  A(<x>)  «  A(<fk(x)>)  for 
all  positive  integers  k.  Hence  A(<x>)  is  an  invariant  of  the  cycle 
through  x. 

Denote  by  the  set  of  sequences  of  N,  and  by  E^  the  subset 
all  possible  itineraries  of  f. 

SA  *  f  s0,s1,s2,...  |  ns  s  -  1  }  (14) 

i  i+1 

Then  we  have  the  following  theorem  (see  e.g.  [3]). 

Theorem  6:  E^  is  a  closed  subset  of  Ejj  ,  and  is  invariant  with 

respect  to  the  si  ..ft  o^. 

The  study  of  the  invariant  sets  is  of  interest  to  the  long 
term  behavior  of  these  maps,  in  order  to  assess  the  stability 
properties  of  the  system  they  represent.  It  is  well  known  from 
linear  system  theory  that  there  is  only  one  equilibrium  point  for 
linear  systems.  This  is  the  origin.  If  the  system  is  unstable, 
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then  any  nonzero  initial  condition  will  diverge  to  infinity,  while 
the  invariant  distribution  for  a  stable  system  is  a  singularity  at 
the  origin.  On  the  other  hand,  it  is  known  that  many,  even 
"simple"  nonlinear  maps  allow  an  invariant  distribution, 
equivalent  to  Lebesgue  measure.  Such  measures  have  a  density 
function.  In  many  cases  the  invariant  density  for  a  system  with 
initial  condition  Xg  may  very  well  be  independent  from  xg.  The 
Birkhoff  Ergodic  Theorem  [4]  gives  a  condition  for  this 
independence:  f  must  be  an  endomorphism,  i.e.  f  is  onto  and  for 
each  measurable  set  p(A)  =  u(f"^(A)).  The  system  behaves  then  as 
if  it  were  driven  by  a  stochastic  process,  even  though  it  is  in 
fact  entirely  deterministic.  So  some  pertinent  questions  are  the 
existence  of  invariant  densities.  Once  this  has  been  established, 
the  uniqueness  of  the  invariant  densities  needs  to  be  determined. 
Finally,  one  needs  to  find  finite  algorithms  to  compute  the 
invariant  density  or  at  least  an  approximation  of  it. 

Due  to  the  complexity,  one  might  resort  to  computer 
simulations  of  such  systems  in  order  to  solve  the  above  problems. 
However,  a  direct  approach  by  simulating  the  system  equations  may 
be  doomed  to  failure,  exactly  because  of  the  possible  chaotic 
nature  of  the  map. 

As  an  example  consider  the  following  pathological  case.  Let  f 
be  the  map 

f  :  R  -->  R  :  x  -->  0  if  x  is  in  Q  (15) 

ax  if  x  is  in  R\Q 

Computer  simulation  of  the  system  x^+i  *  f(xk)  invariably 

lead  to  a  stable  equilibrium  at  0.  The  exact  iteration  will 
however  diverge  to  infinity  for  almost  any  initial  condition  (with 
respect  to  the  Borel  measure)  if  a  >  1.  In  the  stochastic  case, 

the  addition  of  a  standard  white  gaussian  noise  w^  will 

destabilize  it  with  probability  one  even  in  the  case  a  **  1, 
independently  of  the  initial  condition.  The  computer  simulation 
will  yield 

xk+l  *  twk^ 
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where  [w^]  is  the  machine  representation  of  the  noise  sample,  and 
is  therefore  necessarily  a  rational  number.  The  invariant 
distribution  one  would  conclude  from  a  simulation  would  be  the 
standard  gaussian.  Finally  note  that  the  discontinuities  in  f  are 
finite  in  number. 

Another,  perhaps  more  realistic  example  of  such  misbehaving 
computations  is  the  map  f(x)  ■  2  k  x  (mod  1),  due  to  Li  [5], 
Restrict  the  domain  to  the  interval  [0,1].  The  unique  invariant 
measure  is  known  to  be  Lebesgue  measure.  Consider  now  x  2  2'^ 
stored  in  a  n-bit  computer.  After  sufficiently  many  iterations, 
all  initial  conditions  will  end  up  at  zero.  It  seems  that  this 
misbehavior  could  be  avoided  by  combining  the  left  shift 
(multiplication  by  2)  with  an  addition  of  a  random  bit  in  the  lsb 
position,  (after  all,  the  transformation  is  producing  information 
e.g.  see  Shaw  [6]).  As  this  is  equivalent  to  the  addition  of 
noise,  it  seems  that  this  may  help  in  the  computation  of  the 
invariant  distribution.  This  paper  will  on  theoretical  grounds 
discuss  the  interplay  between  the  exact  invariant  distribution, 
and  the  distorted  invariant  distribution. 

The  reason  for  our  concern  with  these  problems  is  that  the 
knowledge  of  the  invariant  distribution  is  of  importance  in 
obtaining  approximate  filtering  algoritms  for  nonlinear  systems. 

The  topics  of  existence,  uniqueness,  and  computation  of  the 
invariant  density  are  dealt  with  in  the  following  section. 

2.5.  Invariant  Measures  for  Piecewise  Affine  Markov  Maps 

In  order  to  find  the  dynamics  of  the  distribution  of  the 
deterministic  system  many  approaches  can  be  taken.  For  continuous 
time  systems,  we  may  take  a  stochastic  point  of  view,  and 
consider  a  dynamical  system,  perturbed  by  external  noise.  (This 
is  the  approach  that  will  be  considered  in  the  stochastic  section 
of  the  paper).  Even  with  a  well  specified  initial  condition,  a 
smooth  probability  density  for  the  state  may  evolve.  The  well 
known  Fokker-Planck  equation  describes  the  evolution  or  flow  of 


34 


the  density.  In  fact,  the  density  itself  may  be  viewed  as  the 
underlying  dynamical  quantity,  whose  dynamics  are  then  governed  by 
the  Fokker-Planck  equation.  Such  an  approach  does  however 
necessitate  the  use  of  (infinite  dimensional)  Banach  or  more 
general  function  spaces.  Conceptually  we  can  then  let  the  variance 
of  the  external  noise  approach  zero,  so  as  to  obtain  the  original 
deterministic  system.  With  random  initial  conditions,  these  are 
usually  referred  to  as  crypto-deterministic  systems,  and  have 
received  a  great  deal  of  attention  in  statistical  mechanics.  In 
this  limit  case,  there  will  obviously  no  longer  be  a  diffusion 
term  in  the  Fokker-Planck  equation.  The  resulting  first  order 
partial  differential  equation  is  commonly  referred  to  as  the 
Liouville  equation. 

The  evolution  operator  induced  by  the  Liouville  equation  has 
its  discrete  analog  in  the  Frobenius- Perron  operator.  If  f  is  a 
measurable  function,  nonsingular  with  respect  to  the  Lebesgue 
measure,  mapping  [0,1]  into  [0,1],  then  the  operator 

Pf  cp(x)  -  d/dx  /  <p(s)ds  (16) 

fl[0,x] 

has  the  properties: 

1)  Pf  is  positive  (i.e.  if  m  i  0  ■*>  Pf<p  >0) 

2)  Pf  preserves  integrals 

1  1 

/Pf<pdm"/<pdm  <pc  ([0,1])  (17) 

0  0 

3)  P(fn)  ■  (Pf)n 

4)  Pf  <pg  ■  <pq  <*«>  du  ■  <Pg  dm  is  invariant  under  f 

(i.e.  for  all  measurable  A:  y(f"^(A))  ■  y(A)  ) 

Remark:  The  Perron-Frobenius  operator  can  be  defined  more 
generally,  in  its  integrated  form,  for  abstract  measure  spaces 
[7]. 


The  invariant  density  of  the  map  f  is  then  nothing  else  than 
the  fixed  point  of  the  Perron-Frobenius  operator.  The  following 
theorem  by  Lasota  and  Yorke  [8]  expresses  the  existence  of  the 

ik 

invariant  density  <p  under  certain  conditions. 

Theorem  7 :  If  f:  [0,1]  -->  [0,1]  is 

a)  piecewise  (with  a  finite  partition) 

b)  inf|(fn  ) * |  >  1  for  some  n, 
then:  for  all  cp  e  L^[0,1] 

n-1 

(1/n)  E  P£  «  -->  <p*  e  L: 

k»0 

where  the  convergence  is  in  the  norm, 
following  three  properties: 

1 )  cp  £  0  ■«>  cp*  £  0 

1  1 

2)  /  <p  dm  *  /  (p  dm 

0  0 

3)  Pf  <p  *  <p  ,  consequently  du  ■  <p  dm  is  invariant  under  f. 

If  instead  of  b),  we  have  the  more  restrictive  condition, 

b')  inf  |f'(x)|  >  1 
x 

then  also  4)  holds 

4)  <p*  has  bounded  variation.  In  fact,  there  is  a  constant  c, 
independent  of  cp  such  that 

Variation  (  cp*  )  S  c  U cp N 

The  piecewise  affine  maps  introduced  earlier  fall  within  the 
class  of  the  theorem,  as  long  as  there  are  a  finite  number  of 
intervals  in  the  partition.  Obviously  such  a  map  is  piecewise  C^, 
and 

inf  | f ' ( x ) |  >  1  <«->  inf  ja^l  >  1 

x  N 

or 
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(18) 

jjf 

Furthermore  cp  has  the 


w  _  w 


inf  |fn(x)'|  >  1  for  some  n 
x 

<“>  inf  Mx>  -  n|a-n[fk(x) ]  (  >  1 

x 

The  condition  a)  of  the  theorem  is  automatically  satisfied.  The 
condition  b')  seems  restrictive.  It  means  that  all  regions  of 
continuity  of  f  must  be  expanding.  The  less  restrictive  condition 
(at  the  cost  of  loosing  the  bounded  variation  property)  leads  to 
the  following 

Theorem  8 ;  A  (finitely)  piecewise  affine  map  has  an  invariant 
density  if  there  exists  an  integer  ng  such  that  for  all  initial 
conditions,  the  average  expansion  over  ng  iterations  is  positive. 
The  average  expansion  in  n  steps  is  defined  as 

1  n-1 

. 2  A(fk(x)j  <19> 

n  k-0 

where 

«  log  jaj  (20) 

Note  that  for  general  piecewise  affine  maps,  still  a  condition 
needs  to  be  checked  over  all  initial  conditions.  For  many 
iterations  (ng)  the  resulting  iterated  map  rapidly  becomes  very 
jagged.  It  is  here  that  the  subclass  of  Markovian  maps  is 
extremely  useful.  It  allows  to  restrict  the  search  of  the  average 
growth  condition  over  the  macrostates  N  rather  than  the  original 
state  space.  Using  the  earlier  defined  transition  matrix  n  the 
subset  of  all  allowable  sample  sequences  2^  may  be  constructed.  If 
these  sequences  are  represented  as  a  horizontal  tree,  for  which 
the  slopes  of  the  branches  correspond  to  the  "growth"  factor  3  , 
then  the  above  theorem  states  that  for  some  level  (of  branching) 
ng,  all  nodes  at  level  ng  of  the  tree  must  be  above  the  zero- level 
line. 

In  case  the  map  f  is  piecewise  C^,  but  with  a  countably 
infinite  partition,  the  extension  of  the  theorem  of  Lasota  and 
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Yorke  [8]  needs  to  be  used.  Its  conditions  are  however  a  lot  more 
restrictive 

Theorem  9:  Let  f:  [0,1]  -->  [0,1]  be  countably  piecewise  C2  such 
that 

i)  inf  |f'(x)|  >  2  ,  sup  | f  *  * (x) |  <  * 

x  x 

ii)  in  almost  all  macrostates,  f  is  onto  [0,1], 
then  the  conclusions  of  Theorem  8  remain  valid. 

Clearly,  in  the  context  of  piecewise  affine  maps,  the 
condition  ii)  in  particular  restricts  the  maps  to  very  special, 
uninteresting  cases.  A  more  interesting  variant  is  based  on 
Adler's  theorem  [9,10]  for  a  map,  f:  I  -->  I,  satisfying  the 
properties: 

There  exists  a  partition  P  of  the  interval  I  into  a  finite  or 
countable  collection  of  disjoint  open  intervals  (Ik)  such  that 

1)  f  is  defined  on  U  Ik  ,  and  m(I\UIk)  *  0 

2)  f(Ik  is  strictly  monotonic  and  extends  to  a  C2  function 
on  the  closure  of  Ik. 

3)  If  f(Ik)  fl  Ij  *  then  f(Ik)  contains  Ij. 

4)  There  is  an  R  so  that 

R 

U  fn(Ik)  contains  Ij  for  all  k  and  j 
n-1  J 

Theorem  10  (Adler):  Let  f:  I  -->  I  satisfy  the  above  properties, 
and  let 

f"(z) 

M  *  sup  sup  —  -  <  +■  inf  | ( fn) ' (x) |  >1  (21) 

Ik  y,zelk  f'(y)2  x 

then  f  admits  an  invariant  finite  measure  dp  ■  <p(x)dx  with  cp(x) 
bounded  away  from  0  and  +». 

If  this  is  applied  to  the  the  piecewise  (finite  or  countable) 
affine  Markov  maps,  the  sufficient  conditions  are  that  the  average 
expansion  for  some  n0  is  strictly  positive,  and  f  is  Markov  map, 
corresponding  to  a  positive  recurrent  chain.  The  type  of  map 


defined  above  is  called  "Markov  map"  in  the  literature,  but  it  is 
obviously  more  restrictive  than  our  definition  of  Markov  maps. 
Note  that  the  theorem  still  requires  the  expansiveness  of  the  map 
at  all  points.  Bowen  [11]  provided  a  method  ("inducing"),  which 
enabled  the  conditions  of  Adler's  theorem  to  be  relaxed  to  certain 
nonexpans ive  maps. 

Another  method  exists  to  extend  the  range  of  validity  of  the 
above  theorems  by  introducing  the  equivalence  of  Conjugation.  Two 
transformations  f:  I  -->  I  and  g:  J  -->  J  are  conjugate  if  there 
exists  a  homeomorphism  h:  I  -->  J  such  that  (see  e.g.  [3]) 


g(x)  -  h  (f(h_1(x))) 


(22) 


This  is  a  generalization  of  the  notion  of  a  similarity 
transformation  in  linear  system  theory.  It  is  then  straightforward 
to  show  that  if  h  is  differentiable,  and  if  <pf  is  the  stationary 
density  of  f,  then  <pg,  the  stationary  density  of  g,  is  given  by 


<p*  (x)  «  <0f(h-1(x)) 


dh'*(x) 

dx 


(23) 


Hence  in  order  to  check  the  existence  of  an  invariant  density, 
it  suffices  to  find  a  conjugate  map  which  is  known  to  have  an 
invariant  density.  The  conjugating  function  h  does  not  need  to  be 
piecewise  linear.  Grossmann  and  Thomae  [12]  have  shown  that 
conjugating  functions  may  be  constructed  by  relating  two  dynamical 
laws  to  each  other.  The  resulting  conjugating  function  h  may  have 
the  structure  of  a  Cantor  function  however. 

Finally,  conditions  for  the  uniqueness  of  the  stationary 
density  were  derived  by  Li  [5,121.  Li  also  provided  a  converging 
approximation  method  to  compute  the  unique  invariant  density.  The 
proof  of  convergence  settled  a  long  standing  conjecture  by  Ulam 
[13].  The  approximation  stems  from  considering  an  arbitrary 
partitioning  of  the  domain  in  finitely  many  disjoint  intervals.  On 
these  a  Markovian  transition  matrix  II  is  defined.  With  this 
construct,  the  Perron-Frobenius  operator  is  effectively 


approximated  (exact  for  Markov  Maps)by  this  transition  matrix,  n. 
The  fixed  point  is  then  very  efficiently  found  by  a  quadratic 
programming  problem,  i.e. 

minimize  I  n  <p  -  cp  II  subject  to  ^  2  0  and  E  <p^  .  ^ 


2.6.  Reconciliation  with  the  theory  of  Markov  Chains 

This  paragraph  justifies  the  name  "Markovian  Map"  introduced 
earlier  for  a  certain  class  of  maps.  Indeed, the  transition  matrix 
n  for  the  macrostates  introduced  in  (4),  satisfies 


ni j  £  0  for  all  i,j  in  N 
E  n^j  a  1  for  all  j  in  N 


(24) 


It  can  be  shown  that  given  the  above  properties  for  a  matrix  n  and 
given  any  distribution  (cp^Jsuch  that 


ip^  2  0 

E  <p^  »  l 
i 

there  exists  a  probability  space,  and  random  variables  Xn,  n  i  0, 
on  that  space  satisfying  the  Markovian  property  (a  kind  of 
"Huygens  principle"  obeyed  by  nonhereditary  systems). 


p^o^xq, . . . ,xn»xn)  -  nx(n)iX(n_1)...nx(1)>x(0)<px(o) 


(25) 


Therefore  all  of  the  theory  of  Markov  chains  becomes  available  in 
the  theory  for  crypto-deterministic  systems.  In  particular,  the 
decomposition  theorem  of  the  state  space  leads  to  a  decomposition 
theorem  for  the  macrostate  space. 

There  are  recurrent  and  transient  states.  For  a  recurrent 
state,  the  probability  that  the  chain  starting  in  that  state 
returns  to  that  state  is  one,  while  it  is  strictly  less  than  one 
for  a  transient  state. 

Theorem  11:  The  set  of  recurrent  states  is  the  union  of  a  finite 
or  countably  infinite  number  of  disjoint  irreducible  closed  sets. 


V'AVV.v.v.y.T^.v. »-. r. . 


The  proof  can  be  found  in  any  elementary  book  on  stochastic 
processes,  e.g.  [14].  It  basically  expresses  the  fact  that  "two 
way  communication"  among  states  is  an  equivalence  relation.  If, 
for  a  recurrent  state  the  mean  return  time  is  finite,  then  the 
state  is  called  positive  recurrent.  If  the  mean  return  time  is 
infinite,  the  state  is  called  null  recurrent.  If  S  is  a  finite 
irreducible  closed  set,  then  all  states  in  S  are  positive 
recurrent.  For  the  set  of  positive  recurrent  states  2+  (strongly 
ergodic  states),  the  following  is  well  known  theorem  [15]. 

Theorem  12:  i)  If  £+  ■  <fi  then  no  stationary  distribution 

exists 

ii)  If  E+  is  nonempty  and  irreducible,  then  a  unique 
stationary  distribution  exists 
iii)  If  E+  is  nonempty  and  reducible,  then  infinitely 
stationary  distributions  exist. 

The  Markov  Chain  theory  gives  information  about  the  behavior 
of  the  Markov  maps  introduced  earlier.  The  problem  is  that  it  only 
provides  information  on  the  oacrostate  space,  and  not  the 
microstate  space.  In  fact,  as  evidenced  by  Sarkowskii's  theorem 
the  structure  at  the  microscopic  level  can  be  much  more  rich  than 
what  can  be  inferred  from  the  macroscopic  picture.  And,  what  is 
worse,  the  existence  of  a  macroscopic  stationary  distribution, 
does  not  necessarily  imply  the  existence  of  a  microscopic 
stationary  distribution. 

2.7.  Minimal  Systems 

In  linear  system  theory,  an  important  role  is  played  by  the 
invariant  subspaces,  especially  in  realization  theory. 
Characterization  of  the  invariant  subspaces  lead  to  the 
decomposition  of  the  system  into  reachable  and  non-reachable 
subsystems,  observable,  and  non-observable  ones.  A  linear  system 
is  minimal  if  there  is  no  lower  order  system  realizing  the  same 
input-output  map.  Minimal  systems  are  important  because  of  their 
joint  reachability  and  observability.  It  is  known  that  the  role  of 
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invariant  subspaces  for  linear  systems  can  be  extended  to 
nonlinear  system  theory  [16].  We  shall  first  extend  the  notion  of 
minimality.  The  proper  way  to  do  this  turns  out  not  to  relate 
solely  on  the  dimension,  or  order  of  the  system.  Rather,  the 
invariant  distributions  are  taken  as  the  defining  property. 
Definition:  A  system  x^+i  ■  f(x^)  Is  minimal  if  no  proper  subset 
of  the  state  space  is  invariant  under  the  action  of  f. 

In  this  section,  we  consider  the  separation  of  the  dynamical 
system  into  minimal  subsystems.  By  definition,  each  subsystem  is 
invariant.  Hence,  the  knowledge  of  the  initial  condition  allows 
one  to  delimit  the  appropriate  state  space.  (This  is  somewhat  an 
abuse  of  terminology,  since  in  the  proper  sense,  the  set  of  states 
does  not  have  the  structure  of  a  vector  space). 

Suppose  now  that  it  is  known  that  the  disjoint  intervals  J^, 
J2»  ...  are  invariant  for  the  restrictions,  respectively:  f|j^, 
f|j2»  ....  Without  loss  of  generality,  we  assume  also  that  these 
intervals  are  disconnected,  in  the  sense  that  for  any  two 
intervals  J^,  Jj , 

span  (J^.Jj)  \  J.^  U  Jj  *  <p 

Span  is  the  closure  of  the  convex  linear  combination  of 

points  of  U  Jj.  The  problem  then  is  to  find  the  proper 
conditions  under  which  the  intervals,  known  to  be  invariant  under 
the  restriction  of  f  to  these  intervals,  remain  invariant  under  f. 

Geometrically,  the  picture  is  obvious  for  continuous  f.  If 
J  ■  (a,b)  is  invariant  under  f jj,  then  by  the  definition  of 
invariance,  the  set  f|j"*(J)  c  J  and  f|j(J)  c  J.  However, 
invariance  under  f  also  requires  f"^(J)  c  J  and  f(J)  c  J.  A 
necessary  condition  for  this  is  that  f(a)  ■  a  and  f(b)  ■  b  ,  as 
can  be  seen  very  easily  from  a  geometric  picture. 

Thus  the  problem  then  boils  down  to  the  properties  of  the  map 
f  in  the  interconnecting  intervals  of  the  form 

span  (JifJj)  \  U  Jj  *  $ 
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Several  possibilities  can  occur: 

a)  The  above  interval  is  itself  invariant,  then  the  whole  of  span 
(J^,Jj)  is  invariant. 

b)  If  say  the  conditions  f(a)  ■  a  and  f(b)  ■  b  are  not  satisfied 
for  either  or  Jj ,  then  "spillover"  will  occur,  and  the 
intervals  invariant  under  the  restricted  map  will  no  longer  be 
invariant  under  f. 

c)  If  the  interconnecting  interval  contains  a  stable  equilibrium, 
then  it  will  also  contain  a  trapping  set,  and  a  smooth 
invariant  distribution  will  not  exist. 

The  above  discussion  leads  to  the  following  theorem: 

Theorem  13:  Intervals  invariant  under  the  restriction  of  f|j, 
either  remain  invariant  (for  which  a  necessary  condition  is  that 
the  graph  of  f  leaves  the  "box"  (J,f(J))  at  the  diagonal  points 
( 3J, f ( 3J) ) ,  or  can  be  embedded  into  larger  invariant  intervals.  If 
the  intervals  remain  invariant,  then  the  interconnecting  interval 
breaks  down  into  alternating  trapping  sets,  and  invariant 
(sub) intervals. 

Note  that  if  and  J2  are  adjacent  open  invariant  intervals, 
then  the  closure  of  their  union  will  be  invariant,  however,  each 
constitutes  a  minimal  system. 

Finally,  we  shall  also  note  that  a  trapping  set  is 
characterized  by 

f(3Ia)  -  3f(Ia) 

where  31  is  the  set  of  limit  points  (boundary)  of  I. 


2.8.  The  Connection  with  the  Stochastic  Case 

In  the  previous  sections,  we  investigated  the  minimal  systems. 
These  were  characterized  by  the  fact  that  a  smooth  invariant 
density  exists  in  the  state  set.  As  a  result,  a  phenomenon,  called 
deterministic  diffusion  occurs,  even  though  no  stochastics  entered 
the  system.  [17] 

Consider  now  the  case  where  a  stochastic  driving  term  enters 
in  the  system  equation.  The  local  transition  model  is  then: 


v.v;  wav.'W  ^  v  va- \wa 


xk+i  *  f(xjj)  +  o(xk)  uk  (26) 

where  the  sequence  {u^}  is  a  standard  white  gaussian  noise, 
uncorrelated  with  the  initial  state  of  the  system  Xq.  In  this 
paper  we  shall  only  deal  with  the  simple  case,  where  the  variance 
o^  is  constant  over  all  of  S. 

The  implication  of  this  assumption  is  that  the  original 
invariant  distributions  of  the  (minimal)  subsystems  will  be 
"clouded"  over  into  one  smooth  invariant  distribution  [18].  Of 
course,  we  assume  that  such  an  invariant  distribution  exists.  A 
sufficient  condition  is,  roughly  speaking,  that  the  system  is 
"eventually  stable"  for  sufficiently  large  states  x.  More 
precisely: 

lim  sup  | f (x) | / | x j  <  1  (27) 
x  — >  « 

If  the  Lebesgue  measures  of  the  minimal  state  spaces  are  large 
compared  to  the  variance  s,  then  the  overall  stochastic  invariant 
distribution  will  be  "close"  in  a  sense  to  be  made  more  precise 
later  to  a  convex  sum  (mixture)  of  the  invariant  distributions  of 
the  deterministically  minimal  systems. 

If,  on  the  other  hand,  the  noise  variance  s  is  much  larger 
than  the  mesures  of  the  minimal  systems,  then  it  is  expected  that 
the  steady  state  distribution  will  be  close  to  the  gaussian 
distribution,  and  in  fact  computable  from  a  global  linearization 
of  the  original  nonlinear  system.  Clearly,  the  intermediate  cases 
are  the  difficult  ones. 

Several  approaches  are  possible  for  the  solution  of  this 
problem.  One  is  to  consider  the  space  of  all  densities,  and  define 
locally  the  system  by  an  evolution  in  this  infinite  dimensional 
space.  Its  advantage  is  that  no  stochastics  as  such  needs  to  enter 
in  the  picture,  and  the  deterministic  concepts  can  be  used,  albeit 
for  an  infinite  dimensional  system.  Alternatively,  and  the  route 
taken  here,  a  new  macrostate  transition  is  computed  or 


approximated  from  consideration  of  the  original  transition 
probability  matrix  II  and  the  "overlap"  of  the  s-neighborhoods  with 
adjacent  domains. 

3.  STOCHASTIC  SYSTEMS 

In  this  section  we  consider  the  properties  of  the  dynamic 
affine  system  when  it  is  driven  by  a  white  Gaussian  noise 
sequence.  The  model  to  be  considered  is  given  by 

xk+l  *  f(xk>  +  wk  (28) 

where  f(x)  is  a  piecewise  affine  map  as  defined  above,  and  {w^}  is 
a  white  Gaussian  noise  sequence  with  zero  mean  and  variance  o2. 
In  the  deterministic  case  it  has  been  shown  that  the  system  can  be 
characterized  by  invariant  aggregations  of  the  macrostates  in  P. 
It  has  also  been  found  that  for  some  contracting  macrostates 
(namely,  ones  with  intersections  with  the  line  f(x)  ■  x)  we  obtain 
equilibrium  points  inside  these  intervals.  The  effect  of  the 
noise  is  to  allow  transitions  out  of  these  intervals.  The 
probability  of  these  transitions  can  be  made  very  small  if  the 
noise  variance  is  small.  For  aggregations  containing  expanding 
intervals  we  obtain  minimal  subsystems  with  invariant 
distributions  without  leaving  the  aggregation  of  such  intervals. 
The  addition  of  noise  to  the  system  will  allow  for  interactions 
among  these  minimal  states  and  will  smooth  the  resulting 
distribution  on  the  aggrgations  of  macrostates.  Obviously  the 
noise  should  allow  for  more  interactions  among  the  macrostates  if 
some  restrictive  assumptions  are  made.  If  the  noise  variance  is 
small  relative  to  the  measure  of  the  contracting  macrostates,  the 
results  should  allow  for  a  high  probability  of  the  system  staying 
around  the  equilibrium  point  if  it  exists  for  such  macrostates, 
and  some  small  probability  of  a  transition  to  other  macrostates. 
Alternatively,  if  the  noise  variance  is  assumed  to  be  large 
relative  to  the  measure  of  expanding  macrostates  then  the  system 
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is  expected  to  escape  these  states  with  high  probability.  We 
formally  derive  the  conditions  that  allow  the  approximation  of 
such  a  behavior  by  a  Markovian  transition  among  purely  linear 
dynamic  models,  for  the  following  simple  problems.  This 
approximation  has  been  considered  for  the  three  region  scalar  case 
in  [191. 

Consider  the  affine  model  for  f(x) 

f(x)  *  x  +  a^  <  x  <  b^,  i  =  1,2,...,N  (29) 

where  the  interval  1^  *  (a^,  b^)  represents  a  macrostate  of  the 

system.  Suppose  the  probability  density  of  xk  is  given  by  pk(x), 
then  we  can  write  the  density  as  a  sum  of  conditional  densities 
pki(x)  given  by  the  following  expressions 

N 

pk(x)  -  Z  P(k-i)i  Pki<x)  (30) 

i*l 

Pkl(x)  »  -  f  — 1 —  q  J?i . )  p(k.L)(y)  dy(31) 

P(k-l)i  J  a  ° 

yeli 

where  the  Pk^  are  the  probabilities  of  being  in  macrostate  1^  at 
time  k,  and  are  given  by 

pki  *  J  Pk^*)  dx  *  (32) 

xelj 

and  where  q(x)  is  the  unit  Gaussian  density  function, 

1 

q(x)  *  -  exp  (-x2/2).  (33) 

It  should  be  noted  that  (31)  is  a  recursive  relation  that 
determines  the  evolution  of  the  density  of  the  state  of  the  system 
xk,  assuming  we  started  from  some  intitial  density.  If  we  now 
also  define  the  transition  probability  among  the  macrostates 
represented  by  the  intervals  {Ii}  by  n(k)  with  ij-elements 

nij(k)  -  P  {  xk+1  e  Ij  |  xk  c  IA  }  (34) 
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(35) 


then  the  transition  probabilities  are  given  by  the  expression 
n^Ck)  =  J  P(k+i)i(x)  dx. 
xeIj 

We  now  assume  that  a  steady-state  solution  to  the  problem 
given  above  exists,  which  would  under  the  restrictions  imposed  by 
the  deterministic  model,  namely  that  an  invariant  distribution 
exists.  The  problem  is  to  determine  the  condition  for  the 
convergence  of  this  model  to  the  one  given  by  the  Markov  switched 
model,  formulated  as 

xk+l  *  ai  xk  +  ^i  +  wk»  when  S(k)  *  i  (36) 

where  S(k)  is  the  state  of  an  underlying  finite  state  Markov 
process  taking  values  i  *  1,2,...,N,  and  has  a  transition 

probability  matrix  given  by  II.  The  approximation  is  in  the 
assumption  that  the  Markov  process  is  independent  of  the  original 
state  of  the  system,  and  in  assuming  that  the  linear  models  are 
not  restricted  by  the  values  of  the  state.  This  approximation 
implies  that  the  process  (S(k))  represents  the  macrostates  of  the 
original  system. 

The  steady  state  solution  satisfies  the  relations 


N 

p(x)  »  2  Pi(x) 
i*l 


PiU) 


(37) 

(38) 

(39) 

(40) 


The  transition  matrix  n  leads  in  turn  to  steady  state 


probabilities  of  the  states  S(k),  given  by  {P^},  as  can  be  seen 
from  (37)-(40).  The  question  becomes  therefore  to  find  the 

conditions  under  which  p^(x)  is  approximately  equal  to  the 
stationary  density  of  the  linear  model  given  by  the  parameters 
and  0^.  We  are  only  interested  in  sufficient  condition,  hence  we 
restrict  our  attention  to  the  cases  of  relatively  large 
contracting  regions  and  small  expanding  regions  relative  to  the 
noise  variance.  The  effects  of  each  of  these  regions  will  be 

considered  in  the  folllowing  subsections. 

First  we  shall  postulate  certain  assumptions  on  the  Transition 
probability  matrix  II  and  then  will  show  that  these  are  satisfied 
for  the  resulting  system  under  the  imposed  condition.  It  will  be 
assumed  that  the  contracting  regions  are  large  relative  to  the 
noise  variance,  hence  we  postulate  that 

1  ■  <<  1,  for  1^  contracting  (41) 

which  implies  that  the  escape  probability  from  such  regions  is 
very  small.  Similarly,  it  will  be  assumed  that  the  expanding 
regions  are  small  relative  to  the  noise  variance,  hence  we 
postulate  that 

<<  1,  for  1^  expanding  (42) 

which  implies  that  the  system  exits  an  expanding  region  with  high 
probability.  The  resulting  solutions  for  the  stationary 
probabilities  Pj_  satisfying  the  equation 

Pi  -  E  Pt  nt1  (43) 

j 

have  the  properties  that  the  P^  are  very  small  for  expanding 
regions.  These  assumptions  will  be  used  to  derive  the  validity  of 
the  switched  Markov  approximation,  and  in  turn  they  too  will  be 
verified. 

3.1.  Contracting  Intervals 


Since  we  assume  that  the  stationary  probabilities  of  the 
expanding  regions  are  small,  we  may  restrict  our  attention  only  to 
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the  contributions  of  the  contracting  regions.  For  large 
contracting  intervals  the  steady  state  stationary  probability 
density  of  the  switched  Markov  approximation  is  given  by  a 
weighted  sum  of  Gaussian  densities,  q^(x),  with  means 


(1  -  04) 
and  variances 

o2  -  o2/(l  -  a2).  (45) 

Since  the  regions  are  contracting,  the  mean  falls  within  the 
interval  I^, 

ai  <  Mi  <  bi- 

In  order  for  the  approximation  to  be  valid  the  length  of  the 
intervals  must  be  large  relative  to  the  variances  of  stationary 
densities,  i.e. 

(b^  -  a^)  >  2m  o^,  m  2  3  (46) 

where  m  may  be  restricted  to  be  larger  than  the  three-sigma  range 
to  ensure  that  the  contributions  of  the  tails  can  be  made  as  small 
as  desired.  Under  these  assumptions  the  effect  of  the  contracting 
regions  on  the  equation  for  p^(x)  given  in  (38)  may  be  obtained  by 
substituting  the  densities  qt(x)  in  the  right  hand  side,  which 
will  show  that  the  resulting  density  is  approximately  equal  to 
q^(x).  The  substitution  in  (38)  yields 

Pi(x)  -  qt(x)  -  f  (1/a)  q[(x  -  04  y  -  P^/o]  q£(y)  dy 
y*ii 

+  E  (Pj /Pi>  J  (l/o)  q[(x  -  y  -  Pi)/o]  qt(y)  dy(47) 

j*i  ydi 

It  is  easy  but  tedious  to  show  that  due  to  the  restrictions  on  the 
relative  sizes  of  the  intervals  and  the  variances  given  in  (46), 


iWi 


WWW! 


the  terms  in  (47)  except  the  first  can  be  made  arbitrarily  small 
by  properly  selecting  the  value  of  m. 

The  effect  of  the  expanding  regions,  which  have  small 
probabilities  P^,  on  (47)  can  also  be  made  as  small  as  desired. 

Finally,  the  resulting  transition  probabilities  are  given 

by 

11^  ■  1  -  4>[-(Pi  -  a^/o^J  -  4>[-(bi  -  Mi)/oi]  (48) 

where  <t(x)  is  the  cumulative  unit  Gaussian  distribution,  namely 
x 

*(x)  -  /  q(y)  dy.  (49) 

-00 

In  view  of  the  assumption  (46)  it  is  seen  from  (48)  that  the 
transition  probabilities  satisfy  the  postulated  properties. 

These  results  can  be  summarized  by  the  following  theorem, 
Theorem  14:  Let  the  system  model  be  given  by  (28) -(29)  and  let 
the  contracting  intervals  satisfy  the  assumption  (46),  then  the 
stationary  density  on  these  intervals,  p^(x),  can  be  approximated 
by  the  stationary  density  of  the  switched  Markov  model  (36). 

3.2.  Expanding  Intervals 

The  expanding  intervals  in  the  switched  Markov  model  do  not 
exhibit  a  stationary  density,  due  to  their  instability.  In  this 
case  we  can  approximate  the  density  on  these  regions  by  using  a 
first  order  approximation.  The  approximation  is  based  on  using 
only  the  first  order  term  of  the  transition  probabilities  to  the 
expanding  regions  in  the  switched  Markov  model.  The  approximate 
density  q^(x)  of  the  expanding  intervals  in  the  Markov  model  is 
given  by 

q^(x)  «  E  Pj  Ilj^  qji(x)  +  higher  order  terms  (50) 

where  the  qj^(x)  are  Gaussian  densities  with  means  p j ^  and 
variances  Oj^  given  by  the  expressions 

Uji  -  at  pj  +  0i  (51) 
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o2a  -  o2  +  a2  <j2  (52) 

We  now  substitute  the  q^(x)  for  the  contracting  regions  in  the 
right  hand  side  of  (38)  we  obtain  a  solution  for  the  p^(x)  of  the 
expanding  regions  on  the  left  side  of  (38).  The  resulting 
solution  can  be  made  arbitrarily  close  to  the  solution  of  the 
Markov  model  provided  the  following  two  conditions  are  satisfied 


for  expanding  regions: 

-  a^)  «  a 

(53) 

(b^  -  a^)  >  6  o 

(54) 

These  condition  imply  that  the  regions  have  to  be  small  but  with  a 
high  slope.  The  results  indeed  show  that  the  transition 
probabilities  II  ^  are  indeed  very  small  for  expanding  regions. 
The  result  can  be  summarized  in  the  following  theorem. 

Theorem  15:  Under  the  conditions  of  Theorem  14,  if  also 
conditions  ( 53 ) - ( 54 )  are  satisfied  then  the  density  function  of 
the  expanding  intervals  of  the  system  (28)- (29)  can  be 
approximated  by  the  switched  Markov  model. 

In  order  to  cover  the  case  of  small  expanding  region  but  with 
relatively  small  slope,  it  is  possible  to  use  series  expansions  as 
shown  in  [19].  The  series  expansion  given  in  [19]  indicates  that 
the  small  region  can  be  absorbed  by  the  adjoining  regions  as  a 
first  order  approximation.  The  approximation  is  also  valid  for 
small  contracting  regions.  The  results  can  be  used  to  construct 
approximate  nonlinear  filters  for  such  nonlinear  systems  as 
described  in  [20]. 

4.  EXAMPLE 

A  first  order  example  with  three  regions  is  considered.  The 
expressions  for  the  resulting  densities  are  explicitly  derived. 
The  nonlinearity  is  assumed  to  be  symmetric,  so  that  only 
densities  for  positive  values  of  x  need  to  be  displayed.  The 
three  regions  are  normalized  with  the  expanding  region  being  (-1, 
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+1)  and  a  having  a  slope  of  Oq,  while  the  contracting  regions  are 
(+1,  +®°)  and  (-«,  -1)  and  having  a  slope  of  a^.  The  equilibrium 
points  for  the  contracting  regions  are  at 

x*  »  (aQ  -  a1)/(l  -  aj)  >  1  (55) 

since  ag  >1,  and  -1  <  aj  <  1.  The  value  of  the  noise  variance  o 
and  the  values  of  the  parameters  a^'s  can  be  selected  to  verify 
the  assumptions  made  on  the  validity  of  the  approximation.  The 
system  satisfies  the  assumptions  discussed  in  the  paper  if  o  << 
(x  -  1).  The  nonlinear  system  was  simulated  numerically  and  the 
transition  probability  of  the  actual  system  was  computed  and 
compared  to  the  Markov  model.  The  two  are  in  agreement  for  o  as 
high  as  10  when  the  slopes  are  selected  as  ag  *  10  and  =  -0.2, 
even  though  the  assumptions  require  o  to  be  smaller  than  10  by  a 
factor  of  3.  The  transition  probabilities  for  these  cases  and  the 
corresponding  stationary  probabilities  are  shown  in  Table  1.  The 
resulting  distribution  for  one  of  the  cases  is  shown  in  Figure  1. 
The  result  corroborate  the  theoretical  results  discussed  in  the 
paper. 


TABLE  1. 

The  Transition  Matrix  n  and  the  Stationary  Probabilities  P 
a  ■  3  a  ■  5  o*10 


.990 

.008 

.002 

.928 

.049 

.023 

.773 

.046 

.181 

.450 

.100 

.450 

.443 

.113 

.443 

.469 

.063 

.469 

.002 

.008 

.990 

.023 

.049 

.928 

.181 

.046 

.773 

[.495 

.010 

.495] 

[.474 

.053 

.474] 

[.476 

.048 

.476] 

5.  SUMMARY  AND  CONCLUSIONS 

In  this  paper  am  approximation  for  piecewise  linear  scalar 
discrete-time  system  was  derived.  It  was  based  on  the  properties 
of  such  systems  in  the  deterministic  case.  In  the  stochastic  case 
the  approximation  depends  on  the  relative  size  of  the  variance  of 
the  driving  Gaussian  noise.  The  approximation  may  be  extended  to 
multidimensional  systems,  and  can  be  applied  to  the  derivation  of 
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nonlinear  filtering  schemes. 


x 


FIG.  1. 

The  Distribution  of  the  Stationary  Density  for  o  *  3. 
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APPENDIX  C 


PIECEWISE  LINEAR  MODELING  OF  MULTIDIMENSIONAL  STOCHASTIC 

NONLINEAR  SYSTEMS 


55/56  (Blank) 


PIECEWISE  LINEAR  MODELING  OF  MULTIDIMENSIONAL  STOCHASTIC  NONLINEAR  SYSTEMS 


A.  H.  HADDAD  AND  E.  I.  VERRIEST 

Georgia  Institute  of  Technology 

Atlanta,  GA  30332-0250 

ABSTRACT 

This  paper  extends  to  the  multi-dimensional  case  an  approximation  of 
nonlinear  dynamic  systems  with  random  inputs,  by  a  set  of  linear  models 
with  Markovian  transitions  among  the  models. 

SUMMARY 

The  problem  of  piece-wise  linear  approximation  of  nonlinear  dynamic 
systems  subject  to  stochastic  input  has  been  considered  for  the  scalar 
case  in  [1-2J.  The  approach  was  used  to  obtain  an  approximate  model  for 
the  discrete-time  nonlinear  system  described  by  Markovian  transitions 
among  linear  models.  The  resulting  approximation  was  then  utilized  to 
derive  an  approximate  filtering  scheme  for  the  nonlinear  system.  The 
analysis  has  been  restricted  to  the  scalar  case  for  simplicity,  and  to 
illustrate  the  concepts.  However,  the  extension  to  the  multi-dimensional 
case  may  appear  to  be  simple  in  principle,  but  it  presents  several 
difficulties.  The  objective  of  this  paper  is  to  provide  a  systematic 
approach  to  the  approximate  multivariable  modeling  problem  for  nonlinear 
systems  driven  by  stochastic  inputs.  The  paper  also  briefly  describes  the 
applications  of  the  model  to  the  approximate  solution  of  the  nonlinear 
filtering  problem. 

The  discrete-time  nonlinear  system  considered  has  the  form 

xk+l  *  f(xk^  +  wk  ( 1 ) 

where  x^  is  the  n-dimensional  state  vector,  and  (w^)  is  a  white  Gaussian 
noise  sequence  with  zero  mean  and  covariance  matrix  Q.  The  nonlinearity 
f(x)  is  assumed  to  be  given  by  a  piece-wise  linear  approximation 

f(x)  ■  Ai  x  +  bit  x  c  fli.  (2) 

The  properties  and  nature  of  the  approximating  model  for  the  scalar  case 
depend  on  the  stability  or  instability  of  the  linear  models,  and  the  size 
of  their  regions,  fl^,  relative  to  the  noise  standard  deviation.  In  order 
to  extend  the  notion  of  stability  and  size  of  the  regions  to  the 
multi-dimensional  case  a  new  definition  as  to  stability  is  needed,  and  a 
more  systematic  approach  to  the  region  definition  is  required. 

First  a  measure  u(»  )  on  Rn  is  defined  relative  to  noise  covariance 
matrix  Q.  If  the  covariance  matrix  is  singular  the  measure  may  be  based 
on  the  covariance  of  the  state  of  an  equivalent  linear  system  that  is 
obtained  from  (I)  by  least  squares  approximations.  The  regions  may  now 
be  classified  as  to  their  being  contracting  or  expanding  (as  opposed  to 
stable  or  unstable),  using  modified  assumptions.  A  region  is  said  to 
be  Contracting  (C)  if  it  satisfies  the  assumption: 

(Cl)  ul f(nt) )  s  u(nt) 

Similarly,  a  region  0^  is  said  to  be  Expanding  (E)  if  it  satisfies 

(Ei)  u( f (nt ) )  >  u(nt) 

where  the  the  notation  f(G^)  is  used  for  the  image  of  the  region  under 
the  transformation  f(x).  The  class  of  nonli nearities  considered  here  is 
further  restricted  to  satisfy: 

(C2)  f(nt)  c  ni  (C3 )  u(nt)  >>  l 

(E2)  nt  c  f(nt ;  (E3)  u(nt)  «  l 

for  contracting  and  expanding  regions,  respectively.  These  assumptions 
(similar  to  the  scalar  case)  restrict  the  class  of  nonlinearities,  but  are 
required  for  the  validity  of  the  approximate  Markov  model. 


The  next  step  is  Chen  Co  derive  che  Markov  transition  probabilities 
among  the  linear  models  given  in  (2),  after  the  restriction  of  (2)  to  the 
regions  0^  is  removed.  The  model  parameters  are  obtained  by 

micro-partitioning  the  space  Rn  into  cells  whose  measure  is  commensurate 
with  unity  relative  to  the  covariance  Q.  The  transition  probabilities  for 
a  contracting  region  can  then  be  approximated  by  considering  the  number  of 
cells  in  an  expanding  region  that  forms  the  closure  of  the  mapping  of  the 
original  contracting  region  under  f.  Similarly,  the  transition 
probabilities  for  an  expanding  region  can  be  obtained  via  the  number  of 
cells  that  are  contained  in  the  contracting  region  forming  the  mapping  of 
the  expanding  region  under  the  transformation  f  when  the  noise  effects  are 
inc luded . 

The  resulting  model  is  similar  to  the  scalar  case  except  that  the 
transitions  are  no  longer  restricted  to  neighboring  regions  only  as 
discussed  in  (2}.  The  assumptions  (C)  and  (£)  ensure  that  the  transition 
matrix  is  relatively  sparse.  One  problem  which  needs  to  be  further 
resolved  is  the  case  when  the  matrix  is  contracting  in  one  direction 
but  expanding  in  another.  Preliminary  investigation  of  this  aspect  is 
being  considered. 

The  next  phase  involves  the  application  of  the  model  to  nonlinear 
filtering  problems.  Here  again,  a  departure  from  the  scalar  case  occurs 
due  to  the  dependence  of  the  transition  probabilities  on  the  value  of  x. 
The  state  of  the  system  is  determined  by  x,  its  covariance  P  under  the 
appropriate  linear  model,  and  the  probabilities  p^  of  the  ith  linear 
model.  The  time  update  in  the  filtering  problem  solution  is  based  on 
updating  P  corresponding  to  the  model  in  effect  for  the  given  state,  and 
then  updating  the  probability  vector  p  based  on  the  ellipsoid  defined  by 
P,  the  state  estimate,  and  the  allowable  regions  for  transitions.  The 
overall  estimate  update  is  obtained  as  a  weighted  sum  of  the  allowable 
transition  regions  resulting  from  the  current  region  and  P.  The 
measurement  update  follows  similarly  by  updating  the  estimate  based  on  the 
possible  models,  while  the  update  of  the  probability  vector  p  is  more  ad 
hoc.  The  update  of  p  is  based  on  a  weighted  sum  of  the  steady  state 
probability  obtained  from  the  original  Markov  model,  and  the  possible 
transitions  determined  from  the  estimate  of  x  and  its  covariance  under  the 
ith  model.  Additional  study  of  the  resulting  filter  and  its  properties  is 
needed . 

There  are  two  additional  major  problems  to  be  considered.  The  first 
is  involved  in  relaxing  some  of  the  restrictions  that  limit  the  types  of 
the  nonlinearities  which  can  be  used.  The  second  is  involved  with  the 
identification  problem  when  the  nonlinearity  is  unknown.  In  both  cases, 
the  dimensionality  problem  can  be  resolved  ae  in  the  scalar  case  utilizing 
the  sparseness  of  the  transition  matrix. 
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This  paper  is  concerned  with  the  problem  of 
nonlinear  filtering  for  piecewise  linear  uncer¬ 
tain  dynamic  systems.  The  system  is  approximated 
hy  a  set  of  linear  models  with  Markovian  transi¬ 
tions.  The  resulting  multi-model  nonlinear 
filtering  scheme  is  implemented  by  allowing  only 
slow  or  fast  transitions  among  the  models,  based 
on  the  stability  or  instability  of  the  linear 
models  involved.  This  implementation  reduces  the 
exponential  Increase  In  the  complexity  of  such 
systems.  The  scheme  does  not  assume  exact  a 
priori  knowledge  of  the  transition  probabilities 
involved  in  the  model. 

1 .  INTRODUCTION 

Nonlinear  discrete  time  models,  which 
usually  model  sampled  versions  of  continuous  time 
systems,  are  very  common  in  many  applications  of 
nonlinear  filtering  schemes.  Many  ad  hoc  approx¬ 
imations  have  been  proposed  for  such  systems. 
Piecewise  linear  approximations  for  nonlinear 
systems  is  such  an  ad  hoc  procedure,  which  has 
a  potential  for  being  amenable  to  systematic 
analysis  in  terms  of  the  quality  of  the  approxi¬ 
mation.  The  systematic  nature  of  the  approach 
stems  from  the  fact  that  once  the  model  is 
approximated  the  error  can  be  arbitrarily  con¬ 
trolled  by  using  additional  terms  or  higher  order 
approximations.  The  general  idea  is  based  on 
approximating  the  piecewise  linear  model  by  a  set 
of  linear  models  with  Markov  transitions  as»ng 
the  models.  An  application  of  the  switched 
Markov  model  to  particular  systems  can  be  found 
In  M|.  In  other  nonlinear  filtering  problems  it 
has  been  primarily  applied  to  systems  with  time 
varying  and  unknown  parameters,  as  discussed  in 
the  aurveys  found  in  (2-4).  The  basis  of  this 
paper  is  the  one-dimensional  case  discussed  in 
(5),  in  which  the  Harkov  linear  approximation  is 
analyzed.  The  required  assumptions  on  the  qual¬ 
ity  of  the  approximations  are  utilized  to  reduce 
the  complexity  of  the  resulting  nonlinear  filter¬ 
ing  problem.  For  simplicity,  the  system  will 
still  be  restricted  to  the  scalar  case,  even 
though  direct  extension  to  the  vector  case  is 
possible. 

we  are  concerned  with  the  following  system 

model 

xk  +  1  •  <3l*k)  ♦  w^  ,  k  •  0,1,2,...  ill 

where  the  nonlinearity  g(x)  is  assumed  to  have 
the  form 


Si*l  -  6i 

9<“>  *  Si  ♦  a - “  (x  '  V  <“!<*<  at 

1  t  +  1  1 


-  atx  *  bt 


,  l  ■  1 , 2, . . . ,M 
(2) 


and  where  the  (wk }  is  a  white  Gaussian  noise 
sequence  with  variance  Q.  It  ie  desired  to  solve 
the  nonlinear  filtering  problem  of  estimating  the 
state  of  the  system  while  the  precise  values  of 
the  cutoff  points  in  the  nonlinear  regions  are 
not  precisely  known.  One  may  also  assume  that 
the  piecewise  linear  models  are  not  precisely 
known,  and  that  the  expressions  given  in  (2)  are 
merely  a  quantized  approximation  to  the  unknown 
linear  levels.  The  observations  are  assumed  to 
be  linear  of  the  form 

yk  •  c  +  v^  ,  k  ■  0,1,2,...  (3) 

where  the  noise  sequence  {v^}  is  also  white 
Gaussian  with  variance  R.  The  problem  can  be 
further  generalized  to  allow  piecewise  linear 
observation  model,  in  which  case  the  Markov 
linear  approximation  can  also  be  used  for  the 
observations  as  well.  However,  this  will  not  be 
done  here  to  conserve  space. 

The  objective,  therefore,  is  to  obtain  a 
nonlinear  filtering  scheme  which  does  not  require 
explicit  identification  of  the  nonlinearity.  The 
approach  is  based  on  approximating  the  nonlinear¬ 
ity  by  s  linear  switched  parameter  Markov  model 
(5).  One  could.elso  apply  the  Bame  approximation 
to  the  observation  process.  The  result  will  be 
similar  to  the  uncertain  observations  case 
discussed,  for  example,  in  (6-6). 

The  next  section  discusses  the  Markov 
switched  approximation  as  derived  in  (5),  with 
particular  emphasis  on  the  restrictions  imposed 
on  the  nonlinearity  that  allows  the  approximation 
to  be  valid.  Next  the  general  filter  structure 
is  summarized,  and  the  need  for  assumptions  to 
reduce  its  exponential  complexity  growth  is 
discussed.  The  paper  then  discusses  the  approach 
used  and  the  restrictions  imposed  on  the  non¬ 
linear  model  that  allow  the  reduction  In  com¬ 
plexity  by  using  only  fast  and  slow  transitions. 
The  paper  concludes  with  discussion  of  possible 
extensions  and  generalizations. 


•This  work  is  supported  by  the  U.S.  Air  Force 
Armament  Laboratory  under  Contract  F08635-84-C- 
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2.  LINEAR  MARKOV  APPROXIMATION 


restrict  our  considerations  to  a  system  model 
displaying  these  three  cases  of  behavior. 


Tor  simplicity  we  restrict  the  discussion  to 
the  scalar  case  as  was  dona  In  [51,  where  the 
linear  Markov  approximation  was  Investigated. 
Here  we  summarize  the  results  In  order  to  provide 
the  basis  for  the  approximations  and  assumptions 
used  In  the  aequal.  The  linear  Markov  approxima¬ 
tion  Is  based  on  defining  macro  states  Sj,  { 1  » 

i . m),  such  thst  If  the  system  Is  In  state  S^, 

then  It  follows  the  following  linear  model 


\*1  •  Vk  *  bi  +  wk 


V  si 


If  the  state  represents  the  region  (a^  <  xfc  < 
a1+)),  then  the  transition  probabilities  among 
the  macro  states  are,  of  course,  dependent  on  the 
value  of  xk.  The  basic  approximation  la  Involved 
In  assuming  that  the  sequence  of  the  macro  states 
form  a  Markov  chain  whose  transition  probability 
matrix  haa  elements 


"u  •  P{)W  3]  1  V  V 


which  are  not  dependent  on  xk,  and  that  the  model 
(4)  la  valid  for  all  values  of  x^. 


Such  an  approximation  would  not  be  an 
improvement  were  It  not  for  the  special  cases 
that  led  to  simplified  models.  In  particular, 
three  cases  can  be  discerned  as  was  discussed 

In  15]. 


1.  The  first  case  occurs  when  In  a  given  atate 
the  original  system  contains  a  relatively 
large  stable  region.  In  this  case,  the 
transition  probability  to  neighboring 
regions  is  very  small,  and  hence  one  can 
assume  that  some  form  of  a  steady  state  may 
be  reached  while  In  that  state.  Further¬ 
more,  when  constructing  all  the  possible 
transitions  In  a  given  time  Interval,  one 
may  assume  a  relatively  small  number  of 
transitions. 

2.  The  second  case  occurs  when  the  original 
system  contains  s  relatively  small  unstable 
region,  having  a  high  gain  and  bounded  by 
two  stable  regions.  In  this  case  It  has 
been  shown  that  the  probability  of  transi¬ 
tion  to  other  states  la  very  high.  Hence, 
one  may  assume  that  the  system  leaves  such  a 
state  In  only  a  few  sampling  Instants. 


3.  The  third  case  considered  occurs  when  a 
stable  region  la  relatively  small  and  has 
two  other  neighboring  stable  regions.  In 
this  case  one  can  write  a  aeries  expansion 
for  the  system  properties,  and  It  can  be 
shown  that  asymptotically,  the  small  region 
can  be  absorbed  by  the  larger  neighboring 
regions  for  modeling  purposes. 


The  only  case  which  la  not  Included  In  the 
three  cases  given  above  Is  the  case  of  a  large 
unstable  region.  However,  such  a  case  may  cause 
other  problems  to  the  system,  and  thus  we  shall 


In  general,  these  restrictions  should  not 
pose  any  difficulties,  as  several  adjacent  linear 
stable  regions  may  be  combined  and  approximated 
by  one  large  stable  linear  region.  Consequently, 
the  general  form  of  the  approximating  model 
involves  a  series  of  large  linear  stable  regions, 
with  relatively  small  unstable  linear  regions 
separating  them.  It  should  be  noted  that  the 
unstable  regions  require  a  relatively  large  slope 
tor  the  approximation  to  be  valid.  However,  It 
the  slope  is  not  large  enough,  again  the  asymp¬ 
totic  series  expansion  applies.  In  such  a  case, 
the  region  may  alao  be  absorbed  by  the  neighbor¬ 
ing  stable  regions. 

The  overall  approximating  model  for  this 
type  of  piecewise  linear  nonlinearity  becomes  a 
switched  Markov  model,  with  transitions  between 
alternating  stable  and  unstable  linear  systems. 
The  approximate  transition  probability  Is  very 
large  for  the  unstable  states,  and  very  small  for 
the  stable  states.  If  one  also  assumes  that 
transitions  beyond  the  Immediate  neighbors  can  be 
neglected  (due  to  continuity  assumptions  on  the 
original  nonlinear  system,  and  the  relative  sizes 
of  the  regions) ,  then  one  obtains  the  following 
form  for  the  transition  probability  matrix 

f'  -  e.  >  i  ■  ’.3 . M 

nii- 

tEl  ,1-2,4 . M-, 


-  3 , 5 , . . . ,M-2 

-  2,4, . . . ,M-1 


(6) 


where  the  cj's  denote  small  positive  parameters, 
and  where  all  the  other  It. ,  are  assumed  to  be 
negligible,  or  identically  zero.  The  number  of 
modele,  M,  Is  of  course  odd,  so  that  the  overall 
system  Is  represented  by  a  stable  model.  The  odd 
■aero  states  represent  the  large  regions,  while 
the  even  ones  represent  the  small  unstable 
regions  of  the  original  nonlinearity.  This 
simplified  structure  will  result  In  significant 
reduction  In  the  complexity  of  the  nonlinear 
filtering  scheme  which  Is  based  on  the  switched 
Markov  linear  model. 


3.  GENERAL  FILTER  STRUCTURE 


The  optimal  filter  for  the  switched  Markov 
linear  model  haa  been  derived  by  many  authors  as 
can  be  found  In  the  surveys  (2-4).  It  consists 
In  general  of  a  weighted  sum  of  multiple  Kalman 
filters  In  parallel  and  may  be  written  as 


£  x  (k)  P{I  (k)IY  ) 
j-1  3  1  * 


%i  (k)  -  oo  ) 


*k  “  (yo'y1 . yk(  (9) 

and  where  I ^  ( k )  Is  a  specific  sequence  of  macro 
states  (sUr)  of  the  Markov  chain  during  the 
observation  interval  (i  *  0,1,..., k).  Thus 

I j  (k)  -  ,  1  <  jt  <  M  (10) 

where  In  the  above  sequence  S ( 1 )  •  Sj  -  the 
state  of  the  system  at  time  1,  and  hence  Mk  * 
M*  .  In  the  sequel  we  shall  denote  Sj^  simply 
by  j  ^  for  simplicity. 

Here  x. (k)  Is  implemented  as  a  linear  Kalman 
filter  matched  to  a  given  sequence  of  transitions 
in  the  Markov  model,  and  P(l. (k) I }  is  the 
generalized  likelihood  functional  of  that 
sequence) 


P(lj  (k)  I  Yr}  - 


P{Yk  lla(k)  )P(I.  (k)  } 


I  PtY.  1 1 .  (k)  )p{l.  (k)  ) 
j-1  3  3 

where  P{YkIIj(k)}  is  Gaussian,  due  to  the  lin¬ 
earity,  for  a  given  sequence  1^  (k > ,  and.  this  la 
easily  obtained  from  the  Kalman  Jf  liters  x.(k)  and 
their  covariances.  The  P{l, (k)  }  are  directly 
given  from  the  transition  probabilities  (6). 

Such  a  solution  la,  of  course,  optimal  and 
depends  on  the  entire  past  record.  However,  it 
is  impractical  due  to  the  exponential  increase  in 
the  number  of  filters  required  as  the  observation 
Interval  increases.  Many  approaches  have  been 
derived  in  the  past  to  avoid  the  Increase  in 
complexity,  including  random  sampling, ‘Gaussian 
approximations,  and  other  criteria  for  truncating 
the  number  of  filters  (3,7],  In  this  paper  we 
utilize  the  properties  of  the  model  as  described 
in  the  previous  section  to  reduce  the  complexity 
of  the  filter.  These  properties  follow  from  the 
fast  and  alow  nature  of  the  Markov  chain  transi¬ 
tions  given  in  (6).  In  this  case,  the  approxima¬ 
tion  is  based  on  the  property  of  the  piecewise 
linear  nonlinearity  and  its  switched  Markov 
linear  model. 

An  initial  reduction  in  the  number  of  fil¬ 
ters  Mk  is  directly  obtained  from  the  tridiagonal 
structure  of  the  transition  probability  matrix 
(6).  In  this  case  the  total  number  of  filtere  is 
approximately 

Mk  ~  3kM  (12) 


However,  this  result  does  not  take  Into  account 
the  end  effects  (states  and  SM)  or  the  rela¬ 
tive  size  of  transition  probabilities. 

The  simplified  scheme  involves  two  sets  of 
filters,  a  slow  set  of  filters  representing  the 
slowly  switching  stable  linear  regions,  and  a 
fast  set  that  represents  the  fast  switching 
unstable  regions.  The  length  of  observation 
interval  used  is  also  truncated  to  be  compatible 
with  the  time  constants  of  the  slowest  linear 
submodel.  In  this  manner,  for  each  filter  in  the 
slow  set,  it  may  start  with  any  of  the  stable 
linear  models,  but  may  include  at  most  one 
switching  in  the  interval,  to  one  of  two  neigh¬ 
boring  unstable  linear  submodels.  Similarly, 
each  fast  filter  for  the  unstable  mode  may  only 
remain  a  small  finite  number  of  samples  in  the 
fast  state  and  then  it  switches  to  one  of  two 
neighboring  stable  linear  models. 

We  now  obtain  an  estimate  of  the  number  of 
the  require  filters  if  the  above  assumptions  are 
also  used.  If  K/2  is  the  number  of  samples 
required  for  the  stable  submodels  to  reach  steady 
state,  and  if  the  unstable  submodels  are  assumed 
to  remain  in  the  unstable  mode  at  most  L  samples, 
then  the  number  of  required  filters  is  approxi¬ 
mately  2XL(M-1).  This  expression  will  be  clari¬ 
fied  when  the  filter  structure  is  described  in 
the  next  subsection.  This  approximation  is 
derived  under  the  assumption  that  no  more  than 
one  switching  occurs  in  the  interval  of  K  samples 
for  the  stable  states.  Consequently,  for  t,«l, 
namely,  the  system  switches  immediately  from  a 
fast  state,.  after  only  one  sample,  then  only 
2K(M-1)  filters  are  required.  However,  even  this 
linearly  increasing  number  may  be  too  large.  In 
that  case,  alternative  approaches  to  the  detec¬ 
tion  of  a  transition  can  be  employed  as  in 
failure  detection  schemes,  or  the  detection  of 
incident  processes. 

3.1  The  Slow  Filters  Structure 

The  slow  filters  are  implemented  by  using 
one  of  the  (M+l)/2  stable  linear  models.  At  the 
starting  phase,  the  filters  used  are  those  corre¬ 
sponding  to  the  assumed  initial  conditions.  As  a 
matter  of  fact,  if  the  initial  conditions  are 
known,  then  only  one  slow  filter  needs  to  be  used 
at  the  starting  phase,  and  then  it  may  switch  to 
other  modes  at  future  time  instants.  However,  if 
the  initial  condition  is  also  assumed  unknown, 
then  the  partitioning  rule  applies  to  the  initial 
condition,  so  that  at  the  start  each  of  the  slow 
filters  assumes  an  initial  condition  equal  to  the 
steady  state  mean  value  of  the  model  for  that 
region. 

The  basic  assumption  used  in  the  implementa¬ 
tion  is  that  during  an  observation  interval  of 
length  K  time  samples,  each  slow  filter  may 
experience  at  most  one  jump  at  some  abltrary 
Instant  l.  If  a  jump  occurs,  the  filter  switches 
into  one  of  two  adjacent  fast  macro  states. 
However,  since  it  is  assumed  that  the  system 
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escapes  the  fast  states  after  one  sample  (this 
assumption  can  be  relaxed,  by  allowing  more  than 
one  sample  in  the  fast  states)  to  switch  to  ano¬ 
ther  adjacent  slow  state,  the  resulting  alow  mode 
filters  may  be  divided  into  three  different  basic 
types i 

The  first  Involves  the  filters  with  no  jumps 
during  the  observation  interval,  and  these  are 
denoted  by 

x ,  (k)  *  E{x  I Y  , I  <k)}  ,  1  •  1,3,...,M  (13) 

lfO  K  K  1  /O 

where  the  sequence  1^  Q(k)  is  defined  by  the 
following  states 

I.  (k)  -  (S<j)  -  1  ,  0  <  j  <  k)  (13a) 

1,0 

The  second  Involves  the  filters  experiencing  a 
jump  at  the  t-th  time  sample,  1  <  t  <  k  -  2,  and 
the  system  returns  to  a  different  slow  mode  after 
the  transition  to  the  fast  mode,  and  these  are 
denoted  by 


transitions  In  each  of  (14)  and  (15),  the  !~0 

is  the  total  number  of  slow  states  less  one  (to 
account  for  the  end  states)  ,  and  the  K  and  L 
represent  the  possible  samples  where  jumps  can 
occur.  Hence,  the  approximate  number  of  filters 
la  given  by  the  expression  2KLIM-1)  as  mentioned 
above. 

The  basic  structure,  therefore,  Involves  a 
set  of  slow  filters  which  are  Interrupted  at  one 
time  sample  l  by  a  jump  to  a  fast  mode  filter, 
where  t  may  vary  over  the  observation  interval. 
The  actual  implementation  of  these  filters  is 
obtained  via  standard  Kalman  filters,  with  a 
change  in  initial  conditions  and  gains  after  the 
transitions  corresponding  to  the  parameters  of 
the  new  state.  Furthermore,  the  fast  filters 
may  be  replaced  by  an  initial  condition  change 
(as  obtained  from  the  applications  of  singular 
perturbation  theory),  so  that  as  a  first-order 
approximation  there  may  be  no  need  to  Implement 
the  fast  filters. 


Xi,i±2(k|l>  •  E{VVIl,U2(kll)1  ' 


1  —  1 , 3 , . . . ,M 


where  the  sequence  Ij  ll>  ls  defined  by  the 

following  states 


I.  ,.,(klt|  -  (S(j)  -  i,  0  <  j  <  t,  S(t)  -1*1, 
'  **  S ( j )  -  i  *  2,  t  <  j  <  k) 


i  -  1,3 . * 


The  third  involves  the  filters  which  return  to 
the  original  alow  mode  state  after  a  jump  to  an 
adjacent  fast  state,  and  these  are  denoted  by 


xi,i±(kl1’  ’  E’W  1t,i±(k,t)l  '  1  ■  . M 


where  the  sequence  ls  defined  by  the 

following  states 


Il,l±(kll>  "  'S(3)  ’  1  M>  S(l)  *  l  *  M  . 


I  •  1 , 3 , . . . ,M  . 


It  is  seen  that  this  last  type  may  be  obtained  In 
two  distinct  ways  depending  on  the  adjacent  fast 
states.  In  all  the  above  cases,  for  the  end 
states  l  •  1,M  the  number  of  filters  need  to 
Include  only  the  allowed  transitions. 


The  number  of  filters  required  to  Implement 
the  scheme  over  the  observation  Interval  K,  may 
be  obtained  from  all  the  distinct  possibilities 
defined  in  (131-115),  and  by  including  the 
assumption  that  the  system  switches  back  from  the 
fast  states  after  sample  t  ♦  j,  1  <  j  <  L.  If  we 
start  with  initial  conditions  in  only  one  region, 
then  (13)  yields  only  one  filter,  while  (14)  and 


(15)  provide  each  2K(^yJ-).L  filters,  where  only 


one-sided  jumps  are  allowed  in  states  1*1,  M . 
Here  the  factor  2  stems  from  the  two  possible 


The  problem  ls  how  to  prevent  the  growth, 
albeit  linear  growth,  in  the  number  of  filters 
for  k  >  K.  It  can  be  easily  shown  that  the  slow 
stable  filters  are  stable  and  approach  steady 
state  faster  than  the  state  of  the  original 
system.  We  shall  assume  that  the  maximum  time 
required  to  approach  the  steady  state  for  the 
stable  filters  ls  K/2.  Consequently,  under  the 
above  assumptions,  for  k  -  t  K/2,  the  slow 
filter  in  (13)— {15)  which  correspond  to  the  same 
final  state,  namely  with  S(k)  -  l,  will  be  aggre¬ 
gated  into  a  single  filter.  The  filters  are  com¬ 
bined,  and  their  likelihood  functionals  are  added 
together.  The  process  after  the  aggregation 
continues  as  at  the  initial  state,  so  that  the 
number  of  required  filters  remains  stable. 


3.2  The  Fast  Filter  Structure 


The  fast  filters  occur  only  during  the  tran¬ 
sition  from  one  slow  state  to  another  slow  state. 
With  the  assumptions  that  it  does  not  stay  in  a 
given  state  more  than  one  sample,  the  resulting 
filter  only  Involves  at  most  a  single  time  vary¬ 
ing  stage  in  the  transition  between  two  slow 
filters.  However,  if  a  continuous  time  system  is 
to  be  Implemented,  or  a  faster  sampling  rate  ls 
available,  then  the  fast  filters  may  be  imple¬ 
mented  at  a  fast  stretched  time  scale.  Namely, 
when  the  fast  filters  are  Implemented,  a  faster 
sampling  rate  may  be  used. 


If  only  one  sample  is  assumed  in  the  fast 
mode  filters,  the  fast  filters  simply  provide  the 
transitional  mode  between  two  slow  states.  If 
the  transition  occurs  at  the  t-th  time  sample, 
the  estimate  corresponding  to  the  fast  mode  may 
be  denoted  by 


1,3,  .  . 
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where  the  sequence  1^  i±1  (t)  la  used  to  denote 
the  following  state  transitions 

-  is(j)  -  1.  0  <  j  <  t,  S(t)  -lit) 
l,U1  06a> 

It  therefore  can  be  Implemented  by  a  Kalman 
filter  with  Initial  conditions  x^  ,  (l)  with 
corresponding  covariances,  and  gains  and  covari¬ 
ance  equation  obtained  from  the  parameters  of 
the  fast  state  S(t)  ■  1  t  1.  The  new  estimate 
obtained  in  (16)  and  its  covariance,  serves  as 
Initial  conditions  for  the  continued  processing 
of  the  slow  filters  (14)— (IS) . 

The  actual  estimate  (16)  Is  used  in  the 
weighted  sum  Implementation  of  the  overall  filter 
only  when  the  transition  Is  assumed  to  occur  at 
i  *  k  -  1,  where  k  Is  the  current  observation 
sample . 

3.3  The  Overall  Filtering  Scheme 

The  overall  scheme  involves  the  evaluation 
of  the  likelihood  functionals  (11)  that  are  used 
in  the  weighted  sum  approximation  (7).  Since  the 
probability  of  transition  is  assumed  to  be  either 
large  or  small,  it  can  be  assumed  that  all  the 
resulting  filters  are  a  priori  equally  likely. 
If  in  addition  a  higher  order  approximation  la 
required  for  the  fast  modes  by  allowing  jumps 
after  more  than  one  sample,  then  a  second  order 
approximation  is  obtained  by  allowingj  a  jump 
after  two  samples  with  probability  c  if  the 
original  jump  probability  is  c.  Of  course,  this 
requires  the  use  of  double  the  number  of  filters. 
Similarly,  if  we  need  to  allow  more  than  one  jump 
in  an  interval  of  length  K,  then  the  number  of 
filters  need  to  be  squared,  unless  additional 
approximations  are  used. 

The  overall  first  order  filter  approximation 
is  therefore  given  by  a  weighted  sum  of  all  the 
filters  defined  in  (13)— (15)  in  addition  to  the 
fast  filters  states  for  1  ■  k  as  given  in  (16). 
The  likelihood  functionals  are  obtained  by  the 
standard  expressions  involving  the  correlation 
of  the  individual  filters  outputs  with  the 
residuals. 


The  filter  needs  to  be  analyzed  in  terms  of 
its  ability  to  identify  the  state  of  the  original 
piecewise  linear  system.  Furthermore,  its  per¬ 
formance  needs  to  be  investigated  via  simulation. 
Preliminary  simulation  of  the  original  system 
indicates  that  the  model  is  quite  good  in  so  far 
as  the  autocorrelation  and  the  density  function 
of  the  Harkov  linear  model  as  compared  to  the 
piecewise  nonlinear  system.  One  possible  model¬ 
ing  assumption  may  be  to  shift  the  regions  of 
validity  of  the  piecewise  linear  approximtions 
so  that  the  transition  probabilities  may  be 
equalized. 

4.  S0M4AAY  AND  OONCLOSIONS 

The  scheme  discussed  here  is  only  a  prelim¬ 
inary  one,  with  many  of  its  properties  yet  to  be 
investigated.  In  particular,  the  initial  scalar 
simulation  needs  to  be  extended  both  to  higher 
dimensions  and  to  systems  with  more  than  the 
three  linear  regions.  A  hierarchy  of  ordering 
for  the  vector  case  needs  to  be  developed  so  that 
the  assumptions  of  tridiagonal  transition  proba¬ 
bility  matrix  may  still  apply.  Finally,  applica¬ 
tions  to  specific  nonlinear  systems  remain  to  be 
carried  out  so  as  to  demonstrate  the  utility  of 
the  approach.  The  approach  can  be  made  system¬ 
atic  in  that  only  the  error  in  the  Markov  model 
should  be  controlled  while  tt.e  error  in  the  iden¬ 
tification  and  filtering  would  follow  from  the 
modeling  error.  Implementation  issues  involved 
with  the  numerical  problems  using  a  large 
number  of  filters  are  investigated  by  Verrlest 
(e.g.  [9]). 
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ABSTRACT 

Thia  paper  la  concerned  with  an  approximate 
nonlinear  filtering  scheme  for  plaaawlaa  linear 
atochaatlc  systems.  The  syscem  la  modeled  by  a 
awltchad  Markovian  tranaltiona  among  linear  coda la . 
The  optimal  aaclmator  for  the  raaultlng  ayatea 
requires  exponentially  Incraaalng  number  of  flltara 
In  a  combined  detection  aatlaatlon  ayatea.  The 
approach  propoeed  In  thla  paper  reducea  the  number 
of  auch  flltara  by  ualng  a  conaiatency  teat  baaed 
on  the  original  linear  regiona  of  the  nonlinear 
ayatea.  In  thia  way  an  improvement  in  the  accuracy 
of  the  scheme  ualng  fixed  number  of  flltara  can  be 
obtained. 

INTRODUCTION 

The  objective  of  thia  paper  la  to  provide  a 
suboptlmal  achaaa  for  nonlinear  filtering  In 
ayateaa  modeled  by  placewlae  linear  nonlinaarltiaa. 
These  ayateaa  have  been  approximated  by  multimodal 
linear  ayateaa  with  Markovian  juapa  among  the 
models  (1-21.  However,  the  resulting  nonlinear 
models  require  a  nonlinear  filtering  schema  that 
uses  a  detection  aatlaatlon  structure  with  a  number 
of  filters  that  la  exponentially  Increasing  (3|. 
An  earlier  approach  (A)  attempted  to  reduce  the 
required  number  of  flltara  In  the  achaae  by 
utilizing  the  sparse  nature  of  the  Markovian 
transition  probability  matrix.  However,  the  scheaa 
based  on  the  approximate  model  does  not  taka  Into 
account  the  consistency  constraints  lapoaed  by.  the 
original  modal  of  the  nonlinear  ayatea.  The  achaaa 
to  be  considered  here  atteapca  to  overcome  both  of 
these  limitations,  by  employing  the  constraints  of 
the  original  modal,  and  requiring  a  finite  number 
(which  can  be  selected  to  achieve  desired  accuracy) 
of  filters  for  the  suboptlmal  filter.  The  scheme 
is  a  modification  of  the  finite  Cause lan  sum 
approximation  used  In  [SI,  with  the  addition  of  the 
consistency  condition  imposed  by  the  original 
piecewise  linear  model. 

The  system  under  consideration  la  given  by  the 
following  discrete  time  model 

xk+l  ’  E^k*  *  wk  U) 

where  the  state  of  the  system  at  time  tk  is  xk,  and 
where  the  noise  sequence  (wk)  la  assumed  to  be 
white  and  Gaussian.  The  observation  model  y  is 
(for  the  doe  being)  assumed  to  be  linear  with 
additive  white  Gaussian  noise  v 

yk  •  C  xk  ♦  vk.  (2) 

The  nonlinearity  is  assumed  to  have  a  continuous 
piecewise  linear  approximation  given  by  the 
following  modal 

g(x)  •  At  x  t  bt.  (3) 

for  x  c  Qt ,  I  •  1 . M 

where  the  regions  ( Q t )  form  a  partition  of  the 


entire  space.  The  approximation  discussed  in  (l -2 1 
provides  the  basis  for  che  nonlinear  filter  design. 
In  thia  approximation  che  system  (l)  is  assumed  to 
have  H  different  linear  models  as  given  by  (3) 
where  model  l  la  valid  when  the  state  S  (called 
macro-state)  of  an  underlying  Markov  process  is 
equal  to  St .  The  transition  probability  matrix  fl 
of  the  Markov  process  la  derived  from  the  original 
ayatem  by  considering  the  tranaltiona  from  °l  Co 
Q<.  Thia  may  seem  like  an  enormous  computational 
effort,  depending  on  the  complexity  of  the 
nonlinearity.  On  the  ocher  hand  this  transition 
matrix  can  be  precomputed.  The  steady  state 
probabilities  can  be  found  in  the  same  manner.  The 
validity  of  che  Markov  approximation  is  based  on 
soma  assumptions  on  the  linear  regiona  (Q^).  Two 
types  of  regiona  are  allowed  for  che  approximation 
to  be  appropriate,  depending  on  a  measure,  u(’J,  of 
region  size  defined  relative  to  the  covariance  of 
the  white  process  noise  w.  The  first  type  Is  vhac 
la  termed  as  a  contracting  region,  satisfying  the 
relation 

u(g(Ut))  <  u(Q^) . 

The  second  type  is  called  an  expanding  region, 
satisfying  the  relation 

u(g(Qt))  >  u(Qt). 

Here,  the  notation  g(Q)  la  used  co  refer  to  the 
image  of  Q  under  the  mapping  of  the  nolinearicy  g. 
Furthermore,  it  is  assumed  that  the  measures  of 
contracting  regions  are  relatively  large,  while  the 
measures  of  expanding  regions  are  relatively  small. 
In  order  to  ensure  che  validity  of  the 
approximation. 

The  resulting  model  is  a  finite  state  Markov 
chain  with  macro-states  (St),  having  transition 
probability  macrlx  (Hij),  and  steady  scace 
probabilities  ( p L ) .  where  the  ntl  are  vary  small 
for  expanding  regions.  When  the  macro-state  is  St 
the  system  obeys  a  linear  model  (3).  The  opclmal 
filter  for  such  a  model  ( 3 1  Involves  a  set  of 
Kalman  filters  macched  to  all  possible  sequences  of 
macro-states,  and  followed  by  a  weighced  sum  using 
Che  generalized  likelihood  function  of  each 
sequence.  This  filter  Involves  an  exponentially 
Increasing  number  of  filters.  An  earlier  approach 
(A I  to  reduce  this  number  co  polynomial  growth 
utilized  the  sparseness  of  che  transition 
probability  matrix,  fl,  and  the  relative  size  of  the 
transitions  to  the  different  types  of  regions.  In 
Chls  paper,  an  alternative  approach  is  used  that 
allows  e  fixed  number  of  filcers,  and  chis  number 
nay  be  expanded  depending  on  Che  need  for  accuracy. 
The  approach  Is  a  modification  of  che  Caussian  sura 
approximation  In  (5)  Chat  utilizes  the  structure  of 
che  original  nonlinear  model. 

GENERAL  FILTERING  SCHEME 

The  schema  is  assumed  to  have  M  possible 
filcers  (such  a  choice  can  be  generalized  to  a 
number  of  filters  Mx ,  for  arbitrary  k),  yielding  at 
time  k,  a  sec  of  M  estimates  x^lk).  corresponding 


covariance*  Pt(k),  and  estimated  probablllcle*  of 
the  macro-states  pL(k).  Hence,  ac  each  stag*  Che 
cocal  Information  state  update  Involves  the 
Incorporation  of  the  measurement  y(k+l)  with  the 
prior  Information  scate 

I(k)i(xt<k),  P1(k) ,  pt(k)  :  1  -  1,2,:. .M> 


p ( k-*- 1 1 k )  •  It  p(k)  (3) 

where  p(k)  denotes  the  vector  of  pt(k)‘s.  The 
update*  of  the  estimates  x  ^  ( k-e- 1 !  k )  and  their 
covariances  Pt(k+l|k)  are  obtained  from  the 
combined  estimate  at  stage  k  and  the  models 
described  by  (3),  to  yield 


to  the  new  Information  state  I(kfl).  At  each 
scage,  the  flrsc  step  Is  to  combine  the  estimates 
and  their  covariances  by  a  weighted  sum  to  arrive 
ac  a  single  estimace  x(k)  and  a  single  covariance 
P(k)  using  the  macro-state  probabilities  p<(k). 
This  Is  the  estimate  that  is  the  output  of  the 
scheme  ac  scage  k.  Then  *  consistency  test  Is  mad* 
to  ensure  that  the  estimate  x(k)  Is  consistent  with 
the  macro-scat*  probabilities  p^(k).  This 
consistency  test  Involves  an  adjustment  of  the 
pt(k)  to  conform  the  x(k)  and  Its  covariance  P(k) 
to  che  region  0^.  The  consistency  update  generates 
H  macro-scats  probabilities  p^(k)  to  be  used  In 
propagating  the  Information  state.  In  order  to 
update  the  Information  for  Che  next  time  Instant, 
the  single  estimate  x(k)  Is  used  together  with  the 
transition  probabilities  I!  and  the  M  models  (3)  to 
obtain  the  information  state  I(k+l|k)  prior  to  the 
next  measurement.  These  estimate*  are  than  updated 
by  incorporating  the  measurement  y(k+l)  via  the 
usual  linear  Kalman  filter  matched  to  the  model 
governed  under  macro-state  S^,  while  likelihood 
fuctlons  are  used  to  obtain  the  measurement  update 
of  the  a  posteriori  macro-state  probabilities.  In 
a  more  general  setting,  more  Information  can  be 
carrlsd  out  from  on*  scage  to  the  next,  so  Chat  a 
sequence  of  two  states  can  be  used  to  update  the 
filters,  if  M2  filters  are  selected. 


Consistency 


Since  this  step  is  the  major  difference  between 
this  approach  and  earlier  ones.  It  will  be 
described  flrsc.  If  che  variance  of  the  estimate 
is  small,  then  the  Information  provided  by  the 
estimates  of  pt(k)  can  be  neglected.  In  this  case, 
these  values  are  changed  based  on  the  position  of 
the  estimate  x(k)  in  the  appropriate  region  Qi,  to 
generate  the  a  priori  macro-state  probabilities 
PL(k)  to  be  used  for  the  transition  co  the  next 
stag*  for  the  updating  of  pjfk+iik).  If  the 
covariance  P  is  large,  then  the  macro-stats 
information  is  relied  on  nor*  heavily  in 
determining  the  macro-state  probabilities.  On*  ad 
hoc  way  to  accomplish  this  is  to  us*  the  following 
weighted  update  expression 


pt(k)  -  o<?)  ptU)  ♦  (l*o(P))  Utli(k)l.  (A) 


xt(k+l!k)  ■  At  x(k)  +•  bt  (61 

Pt(k+l!k)  -  AtP(k)Ai  ♦  Q.  (7) 

where  Q  is  the  process  noise  covariance.  This 
approach  assumes  In  essence  that  the  distribution 
of  the  state  x  satisfies  a  Gaussian  sum 
approximation.  This  Implies  that  the  update  is 
obtained  by  using  H  Kalman  filters  matched  to  the 
linear  models  described  in  (3),  and  with  initial 
value  at  k  given  by  the  combined  estimate  x(k)  and 
its  covariance  P(k).  Again,  in  the  case  of  M2 
filters,  for  example,  w*  have  M  combined  estimates 
at  time  k,  and  each  can  be  subject  to  a  transition 
based  on  on*  of  the  M  macro-states,  to  yield  H2 
estimates.  These  in  turn  will  be  combined  again  to 
provide  another  set  of  M  estimates  for  propagation 
to  the  next  stag*.  Some  of  these  transitions  may 
not  be  possible  due  to  the  structure  of  the 
transition  probability  matrix.  In  such  a  case,  the 
number  H2  serves  only  as  an  upper  bound  on  the 
number  of  filters  used.  These  updated  estimates 
will  not  be  combined  until  after  the  measurement 
updates  that  are  used  on  each  of  the  individual 
estimates  corresponding  to  each  macro-state. 


Measurement  Update  of  Estimates 


The  estimates  after  the  measurement  y(k+l)  is 
available  are  derived  using  the  models  in  (3)  to 
yield  the  standard  Kalman  filter  formulaclon 


xt(k+l)  -  xt(k+l |k)  * 

P^(k+l |k)C'R" ^  (y(k+l )  -  C  xt(kfi,‘k))  (8) 
PjU+l)  -  ( [Pt(k-*-l Ik))"1  *  C'  R'1  C)'1.  (9) 


The  question  is  now  concerned  with  the 
measurement  update  of  the  macro-state  probability 
estimates.  This  can  be  accomplished  by  using  the 
standard  likelihood  function  for  a  switched  Markov 
model.  It  should  be  noted  thac  such  an  update  Is 
only  valid  for  the  true  switched  Markov  model, 
while  It  Is  only  an  aproxlmaclon  In  this  case.  The 
expression  for  the  a  posteriori  probabilities  in 
this  case  will  be  proportional  to  the  likelihood 
function,  given  by 


Here  a(P)  is  a  function  of  the  norm  of  the 
covariance  of  the  estimate,  that  tends  to  tero  as 
the  covariance  becomes  small,  and  tends  to  1  as  the 
covariance  becomes  large.  The  Ut(x)  Is  an 
indicator  function  of  the  region  Qt  chat  represents 
the  macro-state  St.  l.e.  It  Is  equal  to  unity  If 
x  c  Qj.  and  taro  otherwise. 


Time  Update  of  Estimates 


W*  shall  address  first  the  question  of  time 
updating  the  macro-state  probabilities  pt(k+l!k). 
These  are_  updated  by  using  the  consistency  updated 
values  pt(k)  together  with  the  transition 
probabilities.  In  this  case  w*  have 


Pj(k+l)  •  Q  p^ka-ljk)  At(k+l)  (10) 
At<k)  •  •xp(-l(y(k)-Cxl(k)rR*l(y(k)-Cxi(k)|»  (11) 
where  0  Is  a  normalization  coefficient. 

The  consistency  .pdat*  used  earlier  to  provide 
the  s  priori  information  for  the  transition 
probabilities  is  expected  to  compensate  for  che 
fact  Chat  a  smaller  number  of  filters  is  used  than 
warranted  by  the  optimal  estimate  for  the  switched 
Harkov  approximation.  The  fact  that  these  macro- 
statss  originate  in  a  physical  region  is  used  to 
correct  Che  estimate  of  the  likelihood  function 
representing  the  a  posteriori  probabilities  of  che 
macro-states. 


Combined  Estimate 

The  combined  ticuuca  x(k)  is  obtained  by  using 
chs  likelihood  veihted  probabilities  of  ths  macro* 
staces  as  a  weighted  sum  of  the  individual 
ascimacas  as  dlccaced  by  cha  optimal  schama  for  tha 
switched  Harkov  modal 

x(k)  -  I  pt(k)  xt(k).  (12) 

tha  covariancs  for  tha  combinad  astimata  can  ba 
obtainad  in  a  similar  fashion  by  assuming  a 
Gaussian  sum  approximation,  to  yiald  tha  axprassion 

P(k)  -  Z  pt(k)(Pt(k)  +  x1(k)xi(k)}  -  xU)x'(k)  (13) 

vhsra  tha  validity  of  tha  approximation  dapands  on 
tha  validity  of  tha  svitchad  Harkov  modal. 

tha  ovarall  updating  stops  involvad  in  tha 
schama  ars  illustartad  in  Plgure  1. 


Figure  1.  Block  Diagram  of  tha  Filter  Stagas 


AfiALYSIS  OF  THX  FILTER 

tha  complaxlty  of  tha  filtar  pracludas 
analytical  darivatlon  of  its  parforaanca.  Ona  has 
to  raly  on  simulation  and  othar  asymptotic 
techniques  to  addrass  tha  quascion  of  performance 
and  convarganca.  Savaral  obsarvatlons  can  ba  mada 
ralativa  to  tha  bahavior  of  tha  filtar.  Tha  filtar 
parformanca  would  largaly  dapand  on  cha  accuracy  of 
tha  swicchad  Harkov  approximation  for  tha  piece* 
visa  linaar  systam.  Kanca,  tha  filtar  is  axpactad 
to  parform  wall  whan  tha  procass  nolsa  covarianca 
Is  larga  ralativa  to  cha  axpandlng  ragions  of  tha 
nonllnaarity .  and  small  ralativa  to  tha  contracting 
ragions  of  tha  nonllnaarity.  Tha  approximation  is 
such  that  it  can  ba  improvad  by  Incraaslng  tha 
fixad  numbar  of  flltars  us ad  In  tha  schama.  It  Is 
thus  possibla  to  improva  tha  parformanca  by  taking 


mora  stagas  of  mamory  in  tha  schama.  Finally,  tha 
schama  should  parform  battar  than  a  puraly  svitchad 
Harkov  approximation  avan  whan  cha  approximation 
itsalf  is  not  too  good,  dua  to  cha  involvamanc  of 
tha  consistancy  updating  that  ralias  on  tha  axact 
modal  of  tha  systam.  Tha  consistancy  updating  is, 
at  peasant,  basad  on  an  ad  hoc  formulation.  Thara 
is  room  for  improvamanc  in  salacting  an  optimal 
cholcs  for  tha  vaighdng  function  o(P).  In  tha 
naxt  sacclon  a  scalar  casa  is  simulated  in  order  to 
illustrate  tha  bahavior  of  tha  filtering  schama. 

SCALAR  CASE 

A  special  casa  which  is  also  used  for  a 
numerical  example  to  demonstrate  soma  of  cha 
properties  of  the  filtar  is  considered  hare.  A 
scalar  system,  which  is  parameter  Had  by  four 
parameters  and  has  three  ragions  is  used.  The 
system  and  observation  modal  are  given  by 


*1  xk  *  *0  *  *1  *  **  wk’  for  xk  >  l 

*k+l  *  »o  xg  *  b  «g,  for  *1  <  Xg  <  1  (Id) 

aj  xg  *  xq  +  ai  *  b  wg,  for  xg  <  -l 

yg  •  c  xk  +  vk  ( 15) 

where  tha  noise  sequences  w  and  v  are  white 
Gaussian  with  zero  means  and  unit  variances.  The 
analysis  in  this  paper  is  rastrlctad  to  the  case 
ag  >  l  and  -1  <  a^  <  1,  that  yields  a  stable  system 
with  two  contracting  and  ona  expanding  regions. 
Tha  deterministic  systam  has  two  stable  equilibrium 
points  at  tx 

x*  •  (ag  -  aj)/( 1  -  at)  >  at.  (16) 

Two  cases  are  considered,  cha  first  involves  the 
case  of  small  (relatively)  procass  noise,  namely, 
b  <<  (x  *1).  In  this  casa  tha  probability  of 
transitions  from  tha  contracting  regions  is  very 
small,  and  tha  steady  staca  probability  density 
function  of  Xg  may  ba  approximated  by  a  Gaussian 
sum  of  two  densities  with  means  at  tx  and  variance 

•  b2/ <  1  -  a^). 

In  this  case  tha  estimation  problem  becomes 
basically  a  problem  in  detection.  However,  the 
resulting  modal  satisfies  ths  assumptions  chac 
rendar  Cha  switched  Harkov  modal  a  valid  one  for 
Cha  syjcem.  Tha  second  casa  involves  Che  one  with 
b  >>  x  ,  in  which  casa  wa  can  rewrite  the  system 
aquation  as 

*0  ’  *1 

*g+l  “  xg  +  b  (wg  ♦  -  v( xg ) }  (17) 

b 

where  *(x)  is  a  nonlinearity  with  a  limiter 
characteristic.  Dus  to  tha  assumption  on  the 

magnitude  of  b,  tha  additive  tars  to  tha  noise  is 
neg.  .glble  and  the  system  behaves  essentially  as  a 
linaar  system.  Tha  range  of  incarest  should 

therefore  lie  between  tha  cwo  cases  discussed 
above,  avan  Chough  Che  Harkov  approximation  is 
bactar  for  ths  first  case,  the  behavior  of  the 
system  allows  simpler  approaches. 

Tha  symmetry  of  the  problem  allows  the 
derivation  of  the  transition  macrix  of  the  macro* 
states  that  involves  only  three  terms.  These  can 
be  either  derived  directly,  or  In  cases  of  unknown 
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noise  oarameters,  we  nay  uium  values  chat  ara 
compel  xbla  with  high  transition  probabilities  from 
cha  expanding  raglon,  and  low  eranaldon 
probabillclaa  from  cha  eoncraeclng  raglona.  If  wa 
uaa  aubacrlpca  of  +,  - ,  and  0  Co  danoca  cha  chraa 

raglona,  wa  naad  only  darlva  Cha  cranalcion 
probabillclaa  for  P^,  P+. ,  Pg+.  Tha  raoalning 

probabillclaa  ara  obcalnad  by  normalization,  and 
sysoacry.  Tha  acaca  of  Cha  ayacaa  lnvolvaa  Cha  a 
poacariorl  probabillclaa  of  baing  In  ona  of  cha 
chraa  macro-states  ac  obaarvaelon  time  k,  and  Cha 
aaclaaca  of  Cha  acaca  and  tea  covarlanca  givan  any 
parclcular  sequence  of  acacaa.  Tha  approxlaadon 
in  darivlng  cha  fllcar  raaovaa  cha  dapandanca  on  an 
anclra  aaquanca,  and  rallaa  on  only  a  (lnlca  n unbar 
of  acapa.  In  ordar  co  conpanaaca  for  Cha  loaa  of 
infornacion,  cha  probabillclaa  of  baing  In  a  givan 
macro-state  ara  updacad  using  Cha  conalacancy 
updacaa  daacrlbad  In  Cha  pravloua  aacclon.  Tha 
fllcar  will  lnvolva  chraa  aaclnacaa,  wlch  chair 
corresponding  covarlancaa  and  Cha  Chraa  oacro- 
scacaa  probabillclaa,  which  ara  usad  co  obcain  cha 
coablnad  waighcad  aaclaaca  of  Cha  ayacaa. 

SlaualClon  Rasulcs 

Tha  fllcar  waa  aiaulacad  for  a  ranga  of  valuaa 
of  Cha  paraaacara  b,  c,  an,  and  S}.  Tha  aaapla 
variance  of  cha  coablnad  fllcar  (C)  waa  coaparad  Co 
cha  excended  Hainan  fllcar  (E)  and  Co  Cha  linear 
equivalent  fllcar  (L).  Tha  resulCa  for  Cha  valuaa 

»q  ■  10,  at  -  -0.2, 

and  lavaral  valuaa  of  b  and  c  vara  obcalnad  for 
1000  samples  and  ara  shown  In  Tabla  1. 


Tha  raaulca  indlcaCa  chac,  aa  axpaccad,  cha  new 
schema  performs  base  whan  cha  process  noise 
variance,  which  la  dacerained  by  b,  is  high 
ralaclva  Co  cha  slaa  of  Cha  unscabla  region,  buc 
small  ralaclva  Co  Cha  scabla  raglona.  Tha  efface 
of  Cha  aaasuraaanc  nolsa  also  shows  chac  les 
performance  gees  worse  ralaclva  co  Cha  ochar  cwo 
fillers  as  Che  aaasuraaanc  nolsa  Is  dacraasad,  i.e. 
c  la  Increased.  Tha  rasulcs  cand  co  confirm 
several  of  Cha  assumptions  made  whan  the  swicchad 
Harkov  approximation  was  considered.  It  Is 
axpaccad  ehae  Cha  performance  can  be  Improved  if 
additional  filters  ara  usad,  and  Cha  sparsity  of 
the  Cranalcion  probability  aaCrlx  Is  ucilizad  in 
determining 'Cha  viable  maero-staCes  as  was  dona  In 
(*1,  in  addition  Co  Cha  consistency  updates. 

SUMMARY  AND  CONCLUSIONS 

This  paper  considered  a  suboptlmal  filtering 
schaaa  for  tha  nonlinear  estimation  problea  in 
syseaaa  wlch  placawisa  linear  models.  Tha 
approxlsuelons  usad  la  based  on  utilizing  cha 
swicchad  Harkov  nodal  for  cha  syacaa,  as  wall  as 
modifying  cha  resulting  fllcar  wlch  Cha  physical 
conatraincs  of  Cha  stacea  of  Cha  modal.  Additional 
Improvements  ara  possible,  by  incorporating  soma 
features  chac  reflect  Cha  fact  chac  Cha  transition 
probability  matrix  has  special  characCarlstics 
involving  fast  and  slow  transitions.  Additional 
properties  such  as  convarganca  and  optimal  choice 
of  cha  consistency  updating  funcclon  naad  further 
investigations.  Applications  co  nonlinear  cracking 
and  guidance  problems  ara  also  conCamplacad. 
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SUMMARY 

This  paper  la  concerned  with  an  approximate  nonlinear  filtering  schema  for  piecewise  linear  stochas- 
tic  systems.  Such  nonllnaarltlea  nay  be  useful  In  modeling  the  geometric  nonllnearltles  In  an  air-to-air 
cracking  and  guidance  problem.  The  nonlinearity  la  aaausied  to  be  preaant  both  In  the  system  model  and  the 
observation  model.  The  system  Is  modeled  by  a  switched  Markovian  transitions  among  linear  models.  The 
optimal  estimator  for  the  resulting  system  requires  exponentially  Increasing  number  of  filters  In  a 
combined  detection  estimation  scheme.  The  approach  proposed  In  this  paper  reduces  the  number  of  such 
filters  by  using  a  consistency  test  based  on  the  original  linear  regions  of  the  nonlinear  system.  In  this 
way  an  improvement  in  the  accuracy  of  the  scheme  using  fixed  number  of  filters  can  be  obtained.  An 
illustrative  example  demonstrate  the  improvements  provided  by  the  scheme. 


INTRODUCTION 

In  an  alr-to-alr  engagement  scenario,  the  trajectories  of  the  missile  and  target  are  usually  nonlinear 
in  nature.  Furthermore,  the  observations  used  to  Crack  the  target  are  typically  nonlinear  due  to  the 
geomecry  even  If  the  cargec  Is  flying  a  ateady  rectilinear  trajectory.  This  paper  Is  concerned  with  the 
problem  of  tracking  the  target  In  such  a  nonlinear  system  from  noiay  nonlinear  observations.  The  solution 
for  such  a  nonlinear  filtering  schema  Is  not  laplementable  exactly  even  If  one  Ignores  the  uncertainties 
In  the  models  of  che  system.  These  systems  have  bean  approximated  by  nonllnearltles  with  piecewise  linear 
characteristics.  The  objective  of  this  paper  la  to  provide  a  suboptlmal  schema  for  the  nonlinear  filter¬ 
ing  of  Che  states  of  the  systems  whan  those  systems  are  modeled  by  such  piecewise  linear  nonllnearltles. 
In  earlier  papers  (1-2]  these  systems  have  bean  approximated  by  multimodel  linear  subsystems  with  Markov¬ 
ian  jumps  among  chase  models.  However,  Che  resulting  nonlinear  models  require  a  nonlinear  filtering 
scheme  that  uses  a  dacectlon  estimation  structure  with  a  number  of  filters  Chat  la  exponentially  Increas¬ 
ing  [31.  An  earlier  approach  (4)  attempted  to  reduca  the  required  number  of  filters  In  the  scheme  by 
utilizing  the  sparse  nature  of  the  Markovian  transition  probability  matrix.  However,  the  scheme  based  on 
the  approximate  model  does  not  take  Into  account  tha  consistency  constraints  Imposed  by  the  original  model 
of  the  nonlinear  system.  A  scheme  that  laposas  a  consistency  condition  on  the  resulting  filters  was  shown 
to  be  effective  in  Improving  the  filter  performance  without  Increasing  the  number  of  filters  [5].  The 
scheme  overcame  both  of  these  limitations,  by  employing  Che  constraints  of  tha  original  model,  and 
requiring  a  finite  number  (which  can  be  selected  to  achieve  desired  accuracy)  of  filters  for  tha  subop¬ 
tlmal  filter.  Tha  schema  is  a  modification  of  tha  finite  Gaussian  sum  approximation  used  In  [6],  with  the 
addition  of  the  consistency  condition  Imposed  by  tha  original  plscewlsa  linear  model.  This  paper  Inves¬ 
tigates  tha  schema  further,  by  generalising  It  to  tha  Casa  of  nonllnearltles  In  the  observation  process. 
It  also  considers  tha  case  of  allowing  Che  memory  of  the  suboptlmal  filter  Co  be  larger  by  keeping  a 
number  of  filters  over  several  observation  samples. 

The  system  under  consideration  la  given  by  tha  following  discrete  time  model 

*k*l  *  g(*g)  *  b  wk  (1) 

where  X),  is  the  n-dtmanslonal  state  vector  of  the  system  at  time  k,  and  where  Che  noise  sequence  (w^)  is 
assumed  to  be  white  and  Gaussian  with  covariance  matrix  Q.  The  observation  model  yg  (m-dlmansional 
vector)  is  assumed  to  be  nonlinear  with  additive  white  Gaussian  noise  vk  with  covariance  R 

yk  •  h(xk)  ♦  vk.  (2) 

The  nonllnearltles  g(x)  and  h(x)  are  assumed  to  have  a  continuous  piecewise  linear  approximation  given  by 
the  following  model 

g(x)  •  Ci  x  +  gt  (3) 

for  x  c  Qgt,  l  •  1 . Mg 

h(x)  •  x  +  hj  (4) 

for  x  c  Qhl,  1  *  1 . Mh 

where  G^  and  Hj  are  constant  matrices,  gj  and  hj  are  constant  vectors,  and  where  the  regions  (Qgj)  and 
(Qhl)  partitions  of  the  entire  space.  The  two  partitions  may  be  combined  for  simplicity  of  notation 

such  that  a  new  partition  containing  all  nonempty  intersections  of  sets  Qgl  and  Q(,(  for  all  1  and  j  to 
provide  a  partition  with  sets  Qt.  1  •  l.  2 .  M  where  M  is  no  larger  than  HgMh. 

The  approximation  for  (l)  discussed  in  [1-2]  provides  the  basis  for  the  nonlinear  filter  design.  In 
this  approximation  tha  system  (1)  is  assumed  to  have  Mg  different  linear  models  as  given  by  (3)  where 
model  i  is  valid  when  the  state  S(k)  (called  macro-stace)  of  an  underlying  Markov  process  is  equal  to 
where  1  ranges  from  1  to  Mg.  The  transition  probability  matrix  n  of  the  Markov  process  is  derived  from 
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the  original  system  by  considering  the  transitions  from  to  Qg j .  This  cosy  saam  lika  an  enormous 
computational  effort,  dapanding  on  tha  complexity  of  tha  nonlinearity .  On  tha  othar  hand  this  transition 
matrix  can  be  pracomputad.  Tha  staady  stata  probabilitias  can  ba  found  in  tha  same  aannar.  Tha  saaa 
assumption  may  also  ba  mads  about  tha  obsarvation  nonlinaarity  h(x).  Kanes  tha  approximation  is  assumad 
to  hold  for  tha  ovarall  combinad  sat s  {Q^}.  Tha  Harkov  linaar  approximation  yialds  tha  following  rep- 
resentatlon  for  tha  system 


*k+l  •  Gi  *k  +  *1  +  b  “k 

(5) 

yk  -  xk  ♦  ht  +  vk 

(5) 

when  S(k)  •  St,  i  -  1.2.. ...H. 

whara  wa  hava  combinad  tha  two  modals  for  g  and  h  and  chair  ragions,  and  whara  tha  macro-state  S <  cor- 
rasponds  to  cha  ragions  x  e  .  Tha  approximation  lnvolvas  two  assumptions:  Tha  first  is  that  tha 

transition  probabilitias  from  stata  SL  at  tima  k  to  stata  Sj  at  tlms  k+1  ara  not  dapandant  on  tha  actual 
valus  of  cha  stata  xk.  Tha  second  involves  the  approximation  that  the  modals  given  by  (5)-(6)  ara  not 
restricted  by  tha  constraints  of  the  nonlinearities. 

In  general  tha  transition  probability  from  S ^  to  Sj  may  be  obtained  from  the  following  expression 

ntj  •  Prob.  {  x^  c  Qj  |  xk  c  }  (7) 

whara  H  is  tha  transition  probabilty  matrix  for  tha  Harkov  chain  defined  by  S.  Tha  Harkov  chain  also  has 
a  corresponding  steady-state  marginal  probabilities  p^  of  macro-state  S ^  obtained  from 

p  •  p  D  (8) 

whara  p  is  a  row  vector  with  components  p^.  Tha  validity  of  the  Harkov  approximation  is  based  on  some 
assumptions  on  tha  linaar  ragions  {Q^}  relative  to  tha  noise  variances.  Two  types  of  ragions  are  allowed 

for  tha  approximation  to  ba  appropriate,  depending  on  a  measure,  u(*)»  of  region  size  defined  relative  to 

tha  covariance  of  tha  white  noise  processes.  Tha  first  type  is  what  is  termed  as  a  contracting  region, 
satisfying  tha  relation 

u(q{Qi))  <  p(Qi).  (9) 

Tha  second  type  is  called  an  expanding  region,  satisfying  tha  relation 

u(q{Qt})  >  |i(Q t).  (10) 

Hera,  q  stands  for  tha  Joint  function  defined  by  th#  intersect'on  of  g  and  h,  and  tha  notation  q(Q)  is 
used  to  refer  to  the  image  of  Q  under  the  mapping  of  the  nolinaarity  q.  Furthermore,  it  is  assumed  that 
tha  measures  of  contracting  ragions  ara  relatively  large,  while  tha  measures  of  expanding  ragions  ara 
relatively  small,  in  order  to  ensure  tha  validity  of  tha  approximation. 

Tha  resulting  modal  is  a  finite  state  Harkov  chain  with  macro-states  (S ^ > ,  having  transition  proba¬ 
bility  matrix  { 11^ j  > ,  and  staady  state  probabilities  (Pih  where  tha  ara  vary  small  for  expanding 

regions  and  relatively  large  for  contracting  ragions.  Whan  tha  macro-state  S(k)  is  equal  to  tha  system 
obeys  a  linear  state  and  observation  model  (5)-(6). 

The  optimal  filter  for  such  a  model  (also  called  switched  parameter  model)  (3]  Involves  a  set  of 
Kalman  filters  matched  to  all  possible  sequences  of  macro-states,  and  followed  by  a  weighted  sum  using  the 
generalized  likelihood  function  of  each  sequence.  This  filter  Involves  an  exponentially  increasing  number 
of  filters.  An  earlier  approach  (4)  to  reduce  this  number  to  polynomial  growth  utilized  the  sparseness  of 
the  transition  probability  matrix,  0,  and  the  relative  size  of  the  transitions  to  the  different  types  of 
regions.  In  an  earlier  paper  [S]  an  alternative  approach  was  used  that  allows  a  fixed  number  of  filters, 
and  this  number  may  be  expanded  depending  on  the  need  for  accuracy.  The  approach  is  in  some  sense  a 
modification  of  the  Gaussian  sum  approximation  in  ( 6 j  but  which  also  utilizes  the  structure  of  the 
original  nonlinear  model.  The  approach  which  was  restricted  to  state  nonlinearities  is  generalized  In 
this  paper  to  include  the  observation  nonlinearity. 


GENERAL  FILTERING  SCHEME 

The  scheme  la  assumed  to  have  Mr  possible  filters,  depending  on  the  number,  r,  of  levels  of  memory 
utilized  by  the  sheme.  This  memory  reflects  the  maximum  number  of  possible  sequences  of  transitions  of 
the  macro-states  propagated  by  the  scheme.  Let  J(k)  denote  a  particular  possible  sequence  of  macro-state 
transitions  involving  r  samples  ending  at  time  instant  k.  In  other  words  there  are  Hr  possible  such 
sequences  given  by 

(J(k))  -  (Jk-rn . Jk- 1 »  Jk>  (11) 

where 

<Jt  •  1,2 . H;  i  -  k-r+1 . k-l,  k) 

and  where  each  index  J1  represent  the  value  of  the  macro-state  at  time  i  within  the  particular  sequence. 
We  also  denote  by  J(k;l)  a  prtlcular  sequence  J(k)  that  ends  in  macro-state  i  at  time  k,  i.e., 

J(k;i)  -  (J(k-l),  Jk  -  1}  (12) 

In  general  at  time  k,  the  scheme  yields  a  set  of  Hr  estimates  xj(k)(k),  corresponding  covariances 
Pj(K)(k),  and  estimated  probabilities  of  the  macro-state  sequences  obtained  as  normalized  likelihood 
functions  Aj(k)Ot)  reflecting  the  aposteriori  probability  estimate  of  the  sequence  of  r  transitions  of  the 
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macro-states  corresponding  to  the  sequence  of  integers  defined  by  J(k).  Hence,  at  each  stage  the  total 
information  state  update  involves  the  incorporation  of  the  measurement  y(k+l)  with  the  prior  information 
state 


Ilk)  *  (xj(k)lk).  Pj(k)(k)<  AJ(k)(k)l 


(13) 


to  the  new  Information  stace  I(k+1).  Ac  each  staga,  tha  first  step  Is  to  combine  tha  estimates  and  their 
covariances  by  a  weighted  sum  to  arrive  at  a  single  estimate  i(k)  and  a  single  covariance  P(k)  using  tha 
macro-state  sequence  aposterlorl  probabilities.  This  is  tha  estimate  that  is  the  output  of  the  schema  at 
stage  k  and  Is  given  in  general  by 


ilk) 


*  E  *Kk)<k)  AJ(!0(k>. 
J(k) 


(14) 


where  the  emanation  is  over  all  Mr  possible  sequences  J(k).  In  order  Co  update  tha  informaClon  state  it 
is  convaniant  to  dafina  tha  astiaatas  and  likalihood  functions  that  correspond  to  saquancas  of  tha  form 
J(k;i).  Thera  ara  Mr~l  such  saquancas  ending  in  macro-state  i.  These  estimates  and  thair  covariance  will 
be  denoted  as  above  with  tha  subscript  J(k;i)  instead  of  J(k).  In  this  case  wa  can  determine  the  a 
posteriori  probability  of  tha  macro-state  at  time  k  equal  to  i  by  p^(k)  and  expressed  as 


Pl<k>  -(I  *J(ksl)(k). 


(15) 


Similarly  wa  can  dafina  tha  conditional  estimate  at  time  k  based  on  tha  sequence  J(k-l)  as  a  weighted  sum 
of  the  corresponding  estimates  to  J(k;i)  after  averaging  over  all  possible  current  macro-states.  The 
resulting  estimate  is  given  by 


*J(k-l)lk)  -  l  I  ij(kil)lk)  Aj(kii)(k)  }/  AJ(k.1)(k) 


(16) 


where 


Aj(k-l)lk)  -  Z  Aj(k;1)(k) 
1 


(17) 


The  conditional  probability  of  the  currant  macro-state  given  the  sequence  J(k-l)  may  also  be  derived  in 
a  similar  manner 


p1(k|J(k-l))  -  Aj(k;1)(k)  /  Aj(k_i)(k). 
This  estimate  provides  another  representation  for  tha  overall  estimate 


(18) 


s. 


a 


* 
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*(k)  *j(k-l)lk)  AJ(k.1)(k)  (19) 

Tha  rational#  for  such  a  raprasancaclon  la  that  It  provldaa  an  additional  way  to  chack  tha  conslatancy  of 
tha  aatlmata  by  updating  tha  conditional  probability  of  tha  currant  aatimata  of  tha  macro-stata  by  using 
tha  aatimata  of  tha  aatimata  of  tha  atata.  Thia  conslatancy  taat  la  mada  to  anaura  that  tha  aatimata 
^J(k-l)lk)  that  may  ba  usad  to  propagata  to  tha  naxt  stags  la  conaistant  with  tha  macro-stats  probabili¬ 
ties  pj(k j J(k- 1 ) ) .  This  conslatancy  taat  lnvolvss  an  adjustment  of  tha  macro-stata  probabilities  to 
conform  to  tha  stats  aatimata  and  its  conditional  covariance  Pj£k-n(k)  to  tha  region  .  Tha  conslatancy 
update  ganarataa  H  macro-stata  conditional  probablUtlaa  pl(k|J(k-l))  to  ba  used  In  propagating  tha 
information  atata  to  tha  naxt  stage.  In  order  to  update  the  information  for  tha  naxt  time  instant,  tha  Mr 
estimates  naad  to  ba  aggregated  by  averaging  ovar  tha  earliest  time  instant  and  than  updating  tha  filters 
by  using  tha  remaining  eatlmataa  together  with  tha  transition  probabilitlas  0  and  tha  M  modals  (5)-(6)  to 
obtain  tha  Information  atata  I(k+l|k)  prior  to  tha  neat  measurement .  These  eatlmataa  ara  than  updatad  by 
incorporating  tha  measurement  y(k+l)  (corresponding  to  tha  appropriata  modal  of  tha  macro-state)  via  tha 
usual  linaar  Kalman  flltar  matchad  to  tha  modal  govamad  undar  macro-stata  S^,  while  likalihood  fuctlons 
ara  used  to  obtain  tha  measurement  update  of  tha  a  posteriori  macro-state  probabilitlas. 


'j 
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This  approach  can  still  ba  combined  with  othar  approaches  for  reducing  the  filter  complexity  and  the 
number  of  filters  required.  These  include  the  use  of  the  sparseness  of  the  transition  probability  matrix 
of  the  aacro-states  and  the  relatively  small  or  larga  probability  of  transitions  for  certain  states  [4). 
Other  approaches  [7]  involve:  The  aggregation  of  astimates  that  are  approximately  equal  in  terms  of  mean 
and  variance.  The  elimination  of  sequences  that  ara  unlikely  based  on  aposterlorl  probabilities.  The 
combining  of  filters  whose  distance  measures  are  smaller  than  a  certain  value.  These  approaches  are 
expected  to  further  enhance  the  utility  of  the  proposed  approach. 


CONSISTENCY  UPDATE 

Since  this  step  is  the  major  difference  between  this  approach  and  earlier  ones,  it  will  be  described 
first.  If  the  variance  of  the  estimate  is  small,  then  the  information  provided  by  the  estimates  of 
Pi(k|J(k-l))  can  be  neglected.  In  this  case,  these  values  are  changed  based  on  the  position  of  the 
estimate  XTfg-i)(k)  in  the  appropriate  region  Q^,  to  update  the  a  posteriori  macro-state  probabilities 
pi(k|j(k-L))  to  be  used  for  the  transition  to  the  next  staga  for  the  updating  of  p^k^ljk).  If  the 
covariance  Pj(fc-i)00  1*  large,  then  the  macro-state  information  is  relied  on  more  heavily  in  determining 
the  macro-state  probebllit las .  One  ad  hoc  way  to  accomplish  this  is  to  use  the  following  weighted  update 
expression 

pt(k|J(k-I))  -  a(PJ(k.u)  pt(k|j(k-l)>  ♦  (l-a(PJ(k.n)|  U1[iJ(k.l)(k)l  (20) 
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where  the  dependence  of  P  on  k  is  omitted  for  ease  of  presentation.  Here  a(P)  is  a  function  of  the  norm 
of  the  covariance  of  the  estimate,  that  tends  to  zero  as  the  covariance  becomes  small,  and  tends  to  l  as 
Che  covariance  becomes  large.  The  U^(x)  is  an  indicator  function  of  the  region  that  represents  the 
macro-state  Sj,  i.e.  it  is  equal  to  unity  if  x  c  Q^,  and  zero  otherwise. 


TIME  UPDATE  OP  ESTIMATES 


We  shall  address  first  the  question  of  time  updating  the  macro-state  probabilities  p£(k+l|k).  These 
are  updated  by  using  the  consistency  updated  values  p^(k)  together  with  the  transition  probabilities.  In 
this  case  the  conditional  probabilities  will  be  propagated  (before  obtaining  the  next  measurement)  by 
multiplying  by  the  transition  probabilities  of  the  macro-states  as  given  via  the  matrix  n.  Explicitly, 
this  can  be  written  as 


i'j(k+i;i)(k+1l'‘)  -  PjO'l-Kk-i))  a j(k-i)Ot)  nJt 


where  the  left  side  of  the  equation  is  the  probability  that  at  time  k+l  the  macro-state  ends  with  state  i 
preceded  by  the  particular  sequence  J(k),  conditional  on  the  data  up  to  and  including  time  k.  The  updates 
of  the  estimates  )(k+l  |k)  and  their  covariances  Pj(g+i)(k+l |k)  are  obtained  from  the  estimates  at 
stage  k  and  the  models  described  by  (5)-(6),  to  yield 


xJ(k+t)(k'fllk>  "  Gi  \l(k;l)<k)  ♦  *1 
pJ(k+l)(k+1lk>  -  CitlUksD^XV  +  »»\ 


where  a  prime  is  used  to  denote  transposition.  This  approach  assumes  in  essence  that  the  distribution  of 
the  state  x  satisfies  a  Gaussian  sum  approximation.  This  implies  that  the  update  is  obtained  by  using  M 
Kalman  filters  matched  to  the  linear  models  described  in  (5)-(6),  and  with  Initial  value  at  k  given  by  the 
estimates  matched  to  each  possible  preceding  macro-state  sequence  and  its  covariance.  These  in  turn  will 
be  aggregated  again  to  reduce  the  total  number  of  filters  to  Mr  for  propagation  to  the  next  stage.  Some 
of  these  transitions  may  not  be  possible  due  to  the  structure  of  the  transition  probability  matrix.  In 
such  a  case,  the  number  Mr  serves  only  as  an  upper  bound  on  the  number  of  filters  used.  These  updated 
estimates  will  not  be  combined  until  after  the  measurement  updates  that  are  used  on  each  of  the  individual 
estimates  corresponding  to  each  macro-state.  Again,  in  the  case  of  M  filters,  i.e.,  r  -  1 ,  we  have  a 
single  combined  estimates  at  time  k,  and  it  is  propagated  based  on  the  transition  to  any  one  of  the  M 
macro-states,  to  yield  M  estimates. 


MEASUREMENT  UPDATE  OF  ESTIMATES 


The  estimates  after  the  measurement  y(k+l)  is  available  are  derived  using  the  models  in  (5) -(6)  to 
yield  the  standard  Kalman  filter  formulation 


xJ(k+l;l)<k+1)  “  xJ(k+L  i  i)(k+l|k>  +  pJ(k+l;i)<k+llk)Hi,R'1  '»l<k«) 
PJ(k+iii)<k+l>  *  UPj(k+1;l)(k*l|k)|-l  ♦  V  R’1  Ht>rl. 
where  the  v^Ck+i)  Is  the  i  novations  process  based  on  the  macro-state  at  time  k+l  defined  by 

vt(k+l)  -  (y(k+I)  -  H1*  Xj(k+1;i)(k+l|k)  -  ht) 


The  question  is  now  concerned  with  the  measurement  update  of  the  macro-state  probability  estimates. 
This  can  be  accomplished  by  using  the  standard  likelihood  function  for  a  switched  Markov  model.  It  should 
be  noted  that  such  an  update  is  only  valid  for  the  true  switched  Markov  model,  and  it  is  only  an  aproxima- 
t ion  in  this  case.  The  expression  for  the  a  posteriori  probabilities  In  this  case  will  be  proportional  to 
the  likelihood  functions,  Aj(fc;i)(k)(  which  for  simplicity  are  defined  to  include  the  normalizing  con¬ 
stant.  The  update  eqautlons  are  given  in  this  case  by  the  expression 


AJ(k+l;i)(*+l)  *  &  ^J(k+i;i)<k’fllk)  •xpi’J  vj'U+l)  R"1  vi'U+D) 
where  &  is  a  normalization  coefficient. 


The  consistency  up<late  used  earlier  to  provide  the  a  priori  information  for  the  transition  probabili¬ 
ties  is  expected  to  compensate  for  the  fact  that  a  smaller  number  of  filters  is  used  than  warranted  by  the 
optimal  estimate  for  the  switched  Markov  approximation.  The  fact  that  these  macro-states  originate  in  a 
physical  region  is  used  to  correct  the  estimate  of  the  likelihood  function  representing  the  a  posteriori 
probabilities  of  the  macro-states. 


COMBINED  ESTIMATE 


The  combined  estimate  x(k)  is  obtained  by  using  the  likelihood  weighted  probabilities  of  the  macro- 
states  as  a  weighted  sum  of  the  Individual  estimates  as  dictated  by  the  optimal  scheme  for  the  switched 
Markov  model 


x(k)  *  Z  k )( k )  xj(k)(k). 


The  covariance  for  the  combined  estimate  can  be  obtained  in  a  similar  fashion  by  assuming  a  Gaussian  sum 
approximation,  to  yield  the  expression 


.y >  ;o\v  .yyv 


P<  k ) 


-  Z  AJ(k)(k){PJ(k)(k)  +  iJ(it)0‘)»j(k)'llc,.>  '  iU,x'(k) 
J(k) 
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(29) 


where  the  validity  of  the  approximation  depends  on  the  validity  of  the  switched  Markov  model.  Again, 
these  equations  only  show  the  composite  estimate.  The  estimate,  and  their  covariance,  that  are  etched  to 
a  specific  macro-state  are  obtained  in  exactly  the  same  expression  with  a  suaoation  over  the  subscript  j 
of  J ( k )  in  a  similar  manner  to  (15)  for  the  estimates. 


The  overall  updating  steps  Involved  in  the  sch< 
algorithms  can  be  drawn  for  the  special  cases  of  M  ■ 


me  are  Illustrated  In  Figure 
1 ,  or  H  ■  2. 


1.  Similar  but  simpler 


ANALYSIS  OF  THE  FILTEK 

The  complexity  of  the  filter  precludes  analytical  derivation  of  its  performance.  One  has  to  rely  on 
simulation  and  other  asymptotic  techniques  to  address  the  question  of  performance  and  convergence. 
Several  observations  can  be  made  relative  to  the  behavior  of  the  filter.  The  filter  performance  would 
largely  depend  on  the  accuracy  of  the  switched  Markov  approximation  for  the  piecewise  linear  system. 
Hence,  the  filter  is  expected  to  perform  well  when  the  process  noise  covariance  is  large  relative  to  the 
expanding  regions  of  Che  nonlinearity,  and  small  relative  to  the  contracting  regions  of  the  nonlinearity. 
The  approximation  is  such  that  it  can  be  improved  by  increasing  the  fixad  number  of  filters  used  in  the 
schema.  It  is  thus  possible  to  improve  the  performance  by  taking  more  stages  of  memory  In  the  scheme. 
Finally,  the  schema  should  perform  better  than  a  purely  switched  Markov  approximation  even  when  the 
approximation  Itself  is  not  too  good,  due  to  the  Involvement  of  the  consistency  updating  that  relies  on 
the  exact  model  of  the  system.  The  consistency  updating  is,  at  present,  based  on  an  ad  hoc  formulation. 
There  is  room  for  improvement  in  selecting  an  optimal  choice  for  the  weighting  function  a(P).  In  the  next 
section  a  scalar  case  is  simulated  in  order  to  illustrate  Che  behavior  of  the  filtering  schaaie. 


SCALAR  CASE 

A  special  case  which  is  also  used  for  a  numerical  example  Co  demonstrate  some  of  the  properties  of 
che  filter  is  considered  here.  A  scalar  system,  in  which  the  g(x)  and  h(x)  functions  have  three  regions 
each  symmetric  (odd  symmetry)  about  the  origin  is  used  for  demonstartion.  The  nonlinearltles  are  shown  in 
Figure  2  and  are  seen  Co  be  parameterized  by  five  parameters.  The  system  and  observation  model  are  given 
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where  the 

noise 

sequences 

w  and  v 

are  white  Gaussian 

(30) 


(31) 


this  paper  is  restricted  to  the  case  gQ  >  i  and  -1  <  g k  <  1 .  that  yields  a  stable  system  with  two 
contracting  and  one  expanding  regions.  The  deterministic  system  has  two  stable  equilibrium  points  at  tx 


<«0  '  *1>/<1  '  *1>  >  »!■ 


(32) 


Two  cases  are  considered,  the  first  involves  the  case  of  small  (relatively)  process  noise,  namely, 
b  «  (x*  -1).  In  this  case  the  probability  of  transitions  from  the  contracting  regions  is  very  small,  and 
the  steady  state  probability  density  function  of  xk  may  be  approximated  by  a  Gaussian  sum  of  two  densities 
with  means  at  tx  and  variance 


b2/ ( 1  -  gi2). 

In  this  case  the  estimation  problem  becomes  basically  a  problem  in  detection.  However,  the  resulting 
model  satisfies  the  assumptions  that  render  the  switched  Markov  model  a  valid  one  for  the  system.  The 
second  case  involves  the  one  with  b  >>  x  ,  in  which  case  we  can  rewrite  the  system  equation  as 


"  *1  xk  +  ^  +  (33) 


where  »(x)  is  a  nonllnaarity  with  a  llmitar  characteristic.  Due  to  the  assumption  on  the  magnitude  of  b, 
the  additive  term  to  the  noise  is  negligible  and  the  system  behaves  essentially  as  a  linear  system.  The 
range  of  Interest  should  therefore  lie  between  the  two  cases  discussed  above,  even  though  the  Markov 
approximation  is  better  for  the  first  case,  the  behavior  of  the  system  allows  simpler  approaches. 

The  symmetry  of  the  problem  allows  the  derivation  of  the  transition  matrix  of  the  macro-states  that 
involves  only  five  states  because  of  the  nonlinearity  in  the  observations.  These  can  be  either  derived 
directly,  or  in  cases  of  unknown  noise  parameters,  we  may  assume  values  that  are  compatible  with  high 
transition  probabilities  from  the  expanding  region,  and  low  transition  probabilities  from  the  contracting 
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regions.  For  the  g(x)  we  use  subscripts  of  +,  - ,  end  0  to  denote  the  three  regions,  we  need  only  derive 
the  transition  probabilities  for  (1++,  [Tq+.  The  remaining  probabilities  are  obtained  by  normaliza¬ 
tion,  and  symmetry.  If  we  include  the  h(x]  we  obtain  in  general  five  macro-states  and  their  transitions 
can  be  derived  in  a  similar  fashion.  In  the  simulation  the  state  transition  matrix  Is  derived  experimen¬ 
tally  using  about  1000  sample  steps.  Even  though  the  result  is  not  as  accurate  as  analytical  derivation, 
especially  for  low  probability  states,  the  filter  performance  was  robust  relative  to  the  values  of  the 
apriorl  transition  probabilities.  The  state  of  the  system  involves  the  a  posteriori  probabilities  of 
being  in  one  of  the  five  macro-states  at  observation  time  k,  and  the  estimate  of  the  state  and  its 
covariance  given  any  particular  sequence  of  states.  The  approximation  in  deriving  the  filter  removes  the 
dependence  on  an  entire  sequence,  and  relies  on  only  a  finite  number  of  steps.  In  order  to  compensate  for 
the  loss  of  information,  the  probabilities  of  being  in  a  given  macro-state  are  updated  using  the  consis¬ 
tency  updates  described  in  the  previous  section.  The  filter  will  involve  five  estimates,  with  their 
corresponding  covariances  and  the  five  macro-states  probabilities,  which  are  used  to  obtain  the  combined 
weighted  estimate  of  the  system. 

SIMUALTION  RESULTS 


The  filter  was  simulated  for  a  range  of  values  of  the  parameters  b,  hp  h^,  a,  while  gg  ■  10,  and 
g'  -  -0.2  were  held  constant.  The  sample  mean  and  variance  of  the  error  of  the  combined  filter  (C)  were 
compared  to  the  extended  Kalman  filter  (EX?)  for  1000  time  steps.  Table  1  shows  the  values  of  the 
parameters  used  in  the  simulation,  as  well  as  tne  simulation  results.  The  results  are  also  shown  in 
Figures  3-6. 

The  results  indicate  that,  as  expected,  the  new  scheme  performs  best  when  the  process  noise  variance, 
which  is  determined  by  b,  is  high  relative  to  the  size  of  the  unstable  region,  but  small  relative  to  the 
stable  regions.  For  the  range  of  values  selected  the  filter  always  performs  better  than  the  Extended 
Kalman  Filter.  It  would  perform  worse  if  the  observation  is  linear  and  the  assumptions  of  the  switched 
Harkov  models  is  not  satisfied  for  the  process.  The  performance  improvement  is  striking  when  the  observa¬ 
tion  nonlinearity  is  ambiguous,  namely,  it  has  regions  with  negative  slopes.  The  EXE  in  this  case  cannot 
track  the  change  in  the  region  while  the  combined  filter  is  able  to  detect  the  proper  region  when  the 
consistency  update  is  used,  thus  substantially  reducing  the  uncertainty  of  what  the  true  macro-state  is 
supposed  to  be.  It  is  expected  that  for  such  a  simple  scalar  problem  not  much  improvement  can  be  expected 
from  increasing  the  filter  memory.  However,  the  objective  is  to  consider  a  multivariable  nonlinear  system 
to  test  the  validity  of  the  filter  when  more  than  one  memory  level  is  used. 


SUMMARY  AND  CONCLUSIONS 


This  paper  considered  a  suboptimal  filtering  scheme  for  the  nonlinear  estimation  problem  in  systems 
with  piecewise  linear  models  in  both  the  system  and  observations.  The  approximations  used  are  based  on 
utilizing  the  switched  Markov  model  for  the  system,  as  well  as  on  modifying  the  resulting  filter  with  the 
physical  constraints  of  the  states  of  the  model.  Additional  improvements  are  possible,  by  incorporating 
some  features  that  reflect  the  fact  that  the  transition  probability  matrix  has  special  characteristics 
involving  fast  and  slow  transitions.  Additional  properties  such  as  convergence  and  optimal  choice  of  the 
consistency  updating  function  need  further  investigations.  Applications  to  nonlinear  tracking  and 
guidance  problems  are  the  motivation  for  this  problem  as  most  such  scenarios  involve  highly  nonlinear 
geometry  with  a  great  deal  of  uncertainty. 
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Table  1.  Sample  Variance  and  Mean*  for  the  Combined  Filter  and  the  EXF 
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□  Combing  Filter  EKF 

Figure  4.  Comparison  of  the  Combined  Pilter  Error  Variance 
and  EiCF  Error  Variance  for  a  -  .5,  Hq  »  5,  •  0.1 
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□  Combined  Filter  ♦  EKF 

Figure  5.  Comparison  of  Che  Combined  Pilter  Error  Variance 
and  E XI  Error  Variance  for  a  ■  2,  Hg  •  lt  Hj  ■  -0.1 
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APPENDIX  G 

LINEAR  FILTERS  FOR  LINEAR  SYSTEMS  WITH  MULTIPLICATIVE  NOISE 
AND  NONLINEAR  FILTERS  FOR  LINEAR  SYSTEMS  WITH  NON-GAUSSIAN 

ADDITIVE  NOISE 
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Georgia  Institute  of  Technology 
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Abstract 

Exact  optimal  least  squares  linear  filters 
with  precomputable  gains  are  derived  for  the  class 
of  discrete  linear  systems  with  state  update  and 
output  corrupted  by  white  noise  multiplying  a 
linear  function  in  the  state. 

The  derived  method  is  then  applied  to  obtain 
suboptimal  nonlinear  filters  for  linear  systems 
with  non-Gaussian  additive  noise. 


1 .  Background  and  Notation 


In  this  paper  an  exact  linear  least  squares 
filter  for  the  general  discrete  system  of  the  fora 


vi  ■  V* +  Vn +  j,  "IW  (,) 

yk  "  Hk*k  *  v„  ♦  f  wlll>Dkl)*k  (J) 


is  derived.  The  u^,  v.  and  w^  are  white  second 
order  noise  processes  with  zero  mean  and  covariance 


B  p  q 

°E  rk  \  • 

co v<w-;>  •  r;  \  \  :*  ?  <J> 

K  K  °k  * 


D-H.  This  situation  also  occurs  if  the  equations 

(1)  and  (2)  model  quantities  computed  in  floating 
point  arithmetic  (7).  In  mechanical  systems,  the 
bilinear  terms  may  have  their  origin  in  vibrational 
degradation.  Another  application  area  is  the 
filtering  of  linear  systems  with  additive  non- 
Gaussian  noise.  This  is  explored  in  section  3  of 
this  paper. 

Earlier  contributions  in  the  area  of  non- 
Gaussian  filtering  are  based  on  a  Bayesian  approach 
(Bucy  (21).  Alspach  and  Sorenson  (1)  approximate 
the  densities  by  a  Gaussian  sum.  Truncation 
procedures  have  also  been  considered  by  Buxbaum 
and  .Haddad  (31.  Other  approximations  exist,  for 
example  point  masses,  orthogonal  functions,  etc. 
(SI.  In  general  these  filters  are  computationally 
quite  Involved,  and  thus  difficult  to  implement. 
Hence  the  need  for  computationally  attractive  as 
well  as  easy  to  understand  and  Implement  filters. 
Masrelles  (6)  suggested  two  'nice*  filters  which 
are  restricted  to  either  Caussian  state  noise  or 
Gaussian  observation  noise. 

Remark  that  the  entities  B  and  D  in  (1)  and 

(2)  are  in  tact  ( 1 ,2) -tensors,  written  here  in 

terms  ot  their  component  matrices.  For  notational 
convenience  we  shall  use  the  Kronecker  product 
notation  [8] .  (A  tensor  notation  is  too 

encumberant.)  The  last  terms  in  (1)  and  (2)  are 
respectively  written  as 


V  “d  Vwk  •  V 


IB’I'1  ...  ]  and  -  (D^'1 


Their  distribution  is  assumed  to  be  symmetric  about 
zero.  The  superscript  (1)  denotes  the  l”1  compo¬ 
nent  of  w^.  The  random  Initial  condition  xQ  is 
assumed  to  be  uncorrelated  with  u^,  v^  and  w^  and 
has  mean  x  and  P. 

Because  (1)  and  (2)  Involve  nonlinear  opera¬ 
tions,  Gausslaness  of  the  initial  conditions  and 
the  noise  samples  will  in  general  not  be  oonserved 
in  the  state  and  output  processes.  Bence,  we  shall 
omit  assumptions  on  Gausslaness  since  they  will  not 
simplify  matters.  For  this  reason  the  linear  esti¬ 
mates  obtained  will  not  be  the  conditional  expecta¬ 
tions  (conditioned  on  the  observed  data)  and  ate 
thus  not  necessarily  the  exact  least  squares  esti¬ 
mates.  A  special  case  of  the  model  ( 1 )  —  (3)  arises 
for  Instance  if  the  measurement  devices  have  a 
fixed  relative  error  (accuracy)  (e.g.  through  the 
use  of  logarithmic  sensors).  In  this  case  q-1  and 


In  section  2  the  linear  filter  for  the  above 
problem  is  derived.  The  usual  form 


V)  *  Vk  *  Vyk  -  VkJ  (4) 

is  assumed,  and  the  optimal  gain  scheduling  <V  18 
obtained.  The  results  of  this  section  are  then 
applied  in  Section  3  to  yield  suboptimal  nonlinear 
filters  for  the  linear  non-Gaussian  problem. 


Design  of  the  Optimal  Linear  Filter 


Defining  the  estimation  error  x  as  x  -  x  one 
obtains  from  (1)  and  (4)  the  recursion, 


■  F  x 

Vk 


♦  W  -  Vk  +  V\®  V 


where  C  is  the  residual 


(5) 
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t  •  y  -  H.x. 
k  *k  k  X 


Vk  *  Vk  *  V“k  *  V 


The  equation*  (1),  (5)  and  (6)  can  b«  combined  to 


,T  KR  s 
V0  F-K R* 


♦  (5  -l\  0k  ♦  £»>„  <-x  •  v 


The  covariance  equation  for  (7)  follows  by  *aquar- 
Ing  up*  and  taking  expectatlona  together  with  the 
fact  that 


Ev (x ' ,  x 1 )  ■  0 


and  using  standard  Kronecker  product  identities 
[8],  Denoting  the  components  of  the  covariance  as 


the  update  equations  are  (suppressing  the  subscript 
k  on  the  right-hand  sides) 


(F-KH)  P  (F-KH)  '  ♦  GQG'  -  RPC  -  Crx'  ♦  XRX 1 


♦  (B-KD)  [0  ®  (P+C+C'+I)]  (B-KD)  1 


♦  (B-KD)  [  (A 'G* -♦ 'K' )  ®  x)  +  I  (G'.-K»)  ®  x'|  (B-KD)  1 


C.  ,  •  FCF'  +  KHPF*  -  FCH'K'  -  KHPH'K'  ♦  RPC* 
k+1 


-  KRX1  ♦  KDin  ®  (P+C+C'+I)  1  (B-KD)  ' 


+  KD (  ( A 'G 1 -♦  1 K' )  ®  x)  +  (K*  ®  x')  (B-KD)  •  (9) 


^  ,  •  FlF’  ♦  KHC'F'  ♦  FCH'K'  ♦  KHPH'K'  ♦  KRX* 
k+1 


♦  KD |fl  ®  (P  ♦  C  ♦  C'  ♦  DJO'K' 


+  KD(8'R'  ®  x]+(K*  ®  X' JD'K' 


Direct  optimization  of  the  gain  sequence  is 
possible  by  invoking  the  projection  theorem. 
Indeed  the  optimisation  ie  optimal  if  and  only  if 
the  reeulting  error  ie  uncorcelated  (■  orthogonal) 
to  the  estimate.  Renee  setting  the  cross  covari¬ 
ance  C  Identical  to  xero  in  (9)  yields  a  degenerate 
equation  for  the  gain.  In  terms  of  the  generalized 
lnnovetione 


R  •  HPH*  ♦  R  ♦  0(0  9  (P+I) ] D' 


♦  D (♦  *  9  x)  +  (8  •  x')D' 


the  optimal  gain  is 


K°pt  •  (FPH1  ♦  Of  ♦  8(0  •  (P»E) ) 0* 


♦  (GA  9  x 1 ) D‘  ♦  8 (♦  *  ®  X)  ]R 


Backsubstltutlon  of  this  optimal  gain  in  the  update 
equations  (8).  and  (10)  yields 


.£ka,  *  <,rr*  *  *R  R,)k 


P  -  |FPF'  ♦  GQG'  -  KR  K' 


♦  Bin®  (P  + 1)  18*  ♦  B  ( A'G  ‘  9  x)  ♦  (GA  ®  x')B'] 


The  initial  conditions  for  (M)  and  (IS)  are 
respectively 


I  ■  x  x*  and  P  •  P 
ooo  o 


The  formulas  (12)  to  (IS)  yield  the  most  general 
gain  update.  The  gain  dependency  on  the  estimated 
state  x  disappears  if  4  and  A  are  both  zeroi  l.e., 
if  the  multiplicative  noise  w  is  unoorrelated  with 
the  purely  additive  noiaes  u  and  v.  Note  that  the 
formulas  obtained  are  equivalent  to  the  Kalman 
filter  formulas  if  the  noise  terms  Gu  ♦  B(v®  x)and 
v  *  D  (w_@  x)  ire  considered  as  equivalent  additive 
noises  u  and  v. 


3.  Nonlinear  Filter  for  Linear  Systems 
with  Non-Gaueslan  Noise 


Consider  again  the  model  (1),  (2)  but  now  with 
B  and  D  zero.  It  is  assumed  (without  loss  of 
generality)  that  the  noises  u  and  v,  although  non- 
Gaussian,  have  a  symmetric  distribution.  (The 
asymmetries  show  up  as  biases  in  the  odd  moments.) 
The  same  assumption  is  made  for  the  initial  distri¬ 
bution  P(xQ  -  x  ).  It  is  also  assumed  that  u  and  v 
are  independent°although  this  can  be  relaxed  at  the 
expense  of  a  higher  complexity. 

_  The_a  priori  estimate  x  satisfies  a  recursion 


x^^.  •  Fx^ t  *  given.  Subtracting  this  predictable 
part  (trend)  ?he  a  priori  error 


satisfies 


F  X.  ♦  Gu.  ,  x 
k  k  o 


\  *  Vk  '  yk  ’  %  *  yk 


Define  now  the  quantity  x 


X  •  x,j,  -  *  *  * 


•  F12,r 
►  1  * 


♦  (F  9  C)  (x^  ®  uk)  +  (G  9  F)  (ufc  ®  x^)  ♦  G 1  'u( 


where  A1*'  ie  the  k-th  Kronecker  power  of  A.  The 
prior  expectation  of  x  satisfies  now 


•  rl2]l  +  q  l  2  }“ 


where  U  is  the  column  stacked  Q-matrlx.  U  -  cat 
(0).  Subtracting  from  (20)  yields  the  unbiased 
form  in 


•  V  V  V  V  V  V  W  V' 


.v*  .*• : 

■  "  -  *  a  ^  4  r 


vis>N*’  ’ 


X  •  x  -  X 


*(2]  " 


i.«. 


;  -  p(21i 

xk+1  *k 


(22) 


l 2] ' (23) 


♦  (F  ®  G)  (x^  «  uk)  +  (G  ®  F)  (uk  9  *k)  +  C 

where  0  and  U  ace  respectively  Eu .  _ .  •  cat  Q  and 
u  -  Eu,,,.  The  Initial  conditioner  (21)  la 


1 12] 


(2) ' 


X  “  E(x  ®  x  )  •  cat  P  (cat  “  column  stack) 
o  o  o  o 


Similarly  we  get 

,7  . 


♦  V  ♦  H(v  «  x  «•  X  ®  v)  (24) 


for 


Vy(21  ‘  ^121 
*  ‘  V[2]  '  ^(2) 


(25) 

(26) 


The  equations  (17),  (18),  (22)  and  (24)  can  be 

combined  to  yield  the  system 
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Upon  reordering  the  last  terms  in  (27)  and  (28) 
this  augmented  system  is  exactly  of  the  form 
considered  in  (1)  and  (2).  The  top  block  of  the 
tensors  jice  xqro,  resulting  in  a  one-way  coupling 
between  x  and  x>  Here  (u',V)  takes  the  role  of 
the  multiplicative  noise  w,  hence  in  (3)  we  have 

A  -  |°  0 

loo 


The  linear  filter  of  section  2  for  this  augmented 
system  yields  least  squares  estimates^  for  x  end  x 
(say  x  and  x  >  when  driven  by  y  and  X  (which  are 
computable  from  the  data).  Thus  the  proposed 
nonlinear  filter  consists  of  a  feedforward  model 
for  x.  and  X  which  together  with  the  data  yk 
yields  the  driving  terms  y, Cufor  the  second  order 
filter.  The  filter  itself  zollowe  the  recursions 
of  the  previous  section.  Having  est^mtes  of  x  and 
X|2]>  one  can  form  a  better  estlmte  x  by  solving 

*  *  (1)  -1 
(x  -  x)*lPl  ') 

+  Cx*2j  -  x(JJ)'(P(2>)"’  (x*  ®  I  ♦  X  ®  x*)  «  0 
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STOCHASTIC  REDUCED  ORDER  KOOELIHC  OF  DETERMINISTIC  SYSTEMS 
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Abstract 

A  novel  epproech  to  reduced  order  modeling 
l*  given.  "Arclflclel"  note*  l*  Introduced 
to  reflect  uncertainties  In  the  reduced  model, 
end  *  performance  meaeure  la  aeeoclated  to 
validate  the  approach.  Connection*  with  LQC 
deelgn  are  dlecueeed. 


1.  Introduction 

The  ecandard  reduced  order  modeling  problem 
l*  defined  a*  follow*.  Consider  the  nth  order 
linear  time  Invariant  determlnlatlc  *y*t*m. 

x  ■  Ax  +  Bui  x(0)  ■  x„ 

0  (1) 

y  •  Cx 

where  u(c)  l«  a  known  Input  (although  It  will 
at  Clme*  be  attuned  that  u(t)  l*  a  atoch**tlC 
procee*).  The  problem  la  to  deilgn  a  linear 
syacem  of  the  form 

x  ■  Fx  +  Cut  x(0)  «  x. 

9  -  «x  «> 

of  dimension  m  <  n,  with  the  objective  to 

approximate  the  output  y  by  y  In  toot  *en*t. 

All  exiatlng  model  reduction  method*  for 
determlnlatlc  system*,  whether  baaed  on  Markov 
parameter  matching,  Hankel  matrix  reduction, 
moment  matching,  Pad*  method*,  balanced 
realization*,  etc.,  all  yield  reduced  model* 
which  can  be  deicrlbed  by  Inherently 
determlnlatlc  atate  apace  models  of  the  form 
(2).  However,  In  all  the**  caaea  a 

determlnlatlc  part  of  the  full  model  la 
"deleted"  and  thl*  remit*  neceaaarlly  In 
a  loaa  of  Information.  Uncertainty  la  thua 
Intrinsic  In  the  reduced  model  but  thl*  la 
never  taken  Into  account.  Thl*  paper  describe* 
a  design  method  where  thl*  uncsrtslnty  Is 
"conserved"  by  artificially  Introducing 
observation  and  proceaa-nols**.  This 
uncertainty  equivalence  principle  can  of  course 
also  be  brought  Into  the  dealgn  of  reduced 

order  model*  for  stochastic  systems.  In  the 
latter  case  the  atochaatlc  parameter*  In  the 
ayatem  need  to  be  appropriately  augmented. 

In  thl*  paper  some  plausibility  argument* 
will  be  given.  A  full  detailed  analyst*  la 
deferred  to  *  later  paper. 


2.  A  Bilinear  Stochastic  Model 

The  approach  la  baaed  on  the  Ideas  of 

"balancing"  for  open  loop  systems  and  the 
LQC  problem  a*  developed  by  this  author  In 
(2*4),  a*  wall  as  some  Information  theoretic 
Id***.  Some  model  reduction  methods  are  based 
on  th*  "projection  of  dynamic*."  The  reduced 
modal  use*  for  dynamics  th*  projection  of 
th*  dynamic*  of  th*  original  system  to  the 
subspac*  of  th*  part  of  th*  state  that  Is 
of  Interest.  In  balanced  realizations  this 

aubapac*  la  determined  by  Inspection  of  the 
canonical  gramlan  (3).  Within  thl*  framework, 
let  a  partitioning  of  A,  B,  C  be  given  end 

let  (P,  C,  H)  equal  (An ,  B  g ,  C}).  Let  the 

stats  x  also  be  partitioned  as  (x[,  xj)'. 

Finally  w*  aaaums  thac  A  la  aayrapcotlcally 
stable.  We  approach  th*  problem  now  In  several 
step*. 

Step  li  Assume  that  for  eh*  full  order  model 
th*  Initial  (partial)  stats  x20  la  unknown. 
By  lack  of  Information  of  any  kind,  lee  us 
take  a  probabilistic  model,  In  which  x  Is 
gauaalan  distributed  with  mean  zero^u  end 
covariance  X2.  This  la  motivated  by  the  fact 
that  the  gsusalan  distribution  la  th*  maximum 
entropy  distribution  (l.*.,  least  prejudiced) 
given  the  second  momenta.  Host  ayatem  analyal* 
doss  Indeed  not  go  beyond  second  moments. 
Another  motivation  stem*  from  the  ease  In 
dssllng  with  gauaslan  distributions  and  this 
assumption  will  certainly  be  an  Improvement 
over  merely  setting  x  Q  •  0.  Of  course  the 
problem  la  now  ahlftsa  to  tho  decormlnat Ion 
of  Xj.  Her*  w*  Introduce  a  plaualblllty 
argument,  baaed  on  consideration*  of  a 
statistical  ensemble  of  Identical  system*. 
The  state  x.  Is  s*t  up  by  Input*  prior  to 
t  ■  0.  Again  by  lack  of  full  knowledge,  let 
thsss  input*  be  whit*  gauaalan  v  N ( 0 ,  q’l), 
whsr*  th*  q  la  now  *n  unknown  scalar.  Next 
w*  saaum*  that  th*  Initial  condition  x  la 
"typical"  for  a  state  act  up  by  this  white 
gauaalan  noise  after  th*  system  reached  a 
atochaatlc  steady  state.  Assuming  that  (A, 
B,  C)  la  balanced  ( 3 ) ,  Implies  that  x.  Is 
sero  mean  gausslan  distributed  with  covariance 
q'A  ,  th*  canonical  gramlan.  If  x,Q  Is  known 
th*  factor  q’  can  b*  estimated  5y  standard 
statistical  methoda,  e.g., 

»  — TT  [  i i'  -  V  (X  At  ),  m  f  1  (3) 

*  ‘1-1*1 


where  x  '  ■  (x,...  x^,).  Then  the  covariance 

X2  1*  4*  A  i)  A  2 . 

Step  2:  Decouple  the  x^  subsystem  from  the 

full  order  model.  Thia  means  that  we  substitute 
the  state  x^  by  a  stochastic  variable  x_  which 
is  gausslan  with  covariance  q'ttJA  j  at  each 
time)  where  now  q’(c)  ■  qJ(x^(t),  A^)  i*  as 

discussed  in  step  1.  Clearly,  in  the  original 
system  x^Ct)  is  not  wildly  fluctuating,  hence 
it  would  be  somewhat  unrealistic  to  substitute 
x^  for  white  gausslan  noise.  From  tha 
partitioned  equation  for  x^,  the  essential 
dynamics  of  x^  are  governed  by  A22,  and  the 
driving  terms  are  both  u  and  x^.  Here,  the 

approximation  la  to  let  x_  be  directly  decoupled 
from  x^  and  u^  and  be  modeled  as  colored  gausslan 
noise  (with  dynamics  corresponding  to  A22). 
An  "indirect"  coupling  is  retained  by  letting 
the  covariance  of  x^  correspond  to  x^  in  the 

above  described  fashion.  To  this  effect,  let 

*2  “  a22  *2  ♦  v  (4) 

where  v  is  an  assumed  innovations  process  with 
steady  state  covariance  determined  from  the 
Liapunov  equation 

cov(v)  ■  -A22(<)Ja2^  "  (‘)’A2^A22  "  fl*®2®2  ^ 

where  the  latter  equality  follows  from  the 
balanced  realization  properties  1 3].  Tha  syatem 
(1)  la  then  approximated  stochastically  by 

the  bilinear  reduced  order  model  driven  by 
the  (normalized)  colored  noise  9 

?1  "  A^C^  +  B^u  +  Ai29  (6) 

n  »  +  C29  (7) 

where 

*  I  -lj 

9  -  A229  + — - — ®2^1  A1  v  (8) 

m  -  l 

Step  3:  The  reducdon  la  performed  as  a 

simplification  of  the  intervening  colored  noise. 
Rather  than  modeling  che  unknown  dynamics  by 
the  process  01  we  use  now  two  correlated  white 
gausslan  processes,  u  and  u  with  covariance 
cov(Jj)  •  £)  (9).  The  new  model  as  in  (6) 

and  (7)  but  with  Aj29  and  C20  substituted  by 
u  and  u .  Several  criteria  can  now  be  chosen, 
based  on  different  design  restrictions,  to 
select  che  covariances.  If  the  output 
approximation  mutt  be  smooth,  then  set  u  “ 
0  and  thus  also  R  ■  0  and  M  ■  0.  W  can  Chen 
be  chosen  such  chat  cov(  n  )  "  cov(y)  (l.e., 

we  retain  Che  tame  uncertainty  in  the  model). 
If  a  wildly  fluctuating  model  output  la 
tolerable,  then  equality  of  Che  output 
covariances  may  be  combined  with  Che  equality 
of  the  correlation  between  process  and 
observation  noise.  It  can  for  instance  be 
shown  Chat  if  2m  <  n  and  A^2  nonsingular  (which 
must  be  true  for  the  S1S0  balanced  realization), 
Chen  many  soludons  exist. 

Remark  1)  The  approximation  in  step  2  will 

be  better  if  the  components  Ai2x.  and  C2x, 
are  "small"  relative  to  respectively  A^xf 
+  B^u  and  C;x..  This  is  the  basts  for  the 
next  section  on  TQG  modeling. 


The  well  known  separation  property  for 
Che  solution  of  che  LQG  problem  breaks  down 
in  the  LQG  modeling.  (See  [1]  and  ( 5 ) ) 


The  approach  taken  above  has  motivated 
(quantitatively)  that  model  reduction  should 
be  accompanied  by  proper  specification  of 
(artificial)  process  and  measurement  noise 
covariances.  Equivalently,  the  noise 

covariances  one  selects  In  LQG  design  should 
reflect  che  unmodeled  dynamics.  In  order 
that  this  approximation  of  unmodeled  dynamics 
by  noise  is  tolerable,  the  variances  of  these 
components  should  not  be  coo  large  compared 
to  the  "main  components."  From  remark  11) 
above  It  follows  that  for  the  balanced 
realization,  this  will  be  guaranteed  If  a 
performance  index  (P.I.) 


EJ{  I  I a12x2 I  I  M  +  “II  c2x2 I  I  ’  +  o||lAllxi 

♦  “lull1  ♦  «l|Ci*l|||}de  (12) 

la  Introduced  for  some  a  >  0  and  0  <  p  < 

1.  But  x^  Is  not  available  In  che  reduced 
model,  so  Its  covariance  must  be  estimated 
as  in  step  1.  The  end  result  Is  the  P.I. 
with  Integrand 


'  «  paiibi\  (xi\ 

pBiAu  P®i»i j  (  j 


where  n  •  g  \  ^  Aj1  Tr(xJC2C2  +  a[2Ai2) 
A2+  p(AllAll  +  a’C^Ci)  (14) 


For  systems  with  slight  nonllnearlcles , 
standard  perturbation  methods  yield  linear 
models.  If  again  the  perturbing 

nonlinearity  is  "set  to  zero"  a  loss  of 
information  results.  We  can  Chen  again  macch 
the  uncertainty. 

In  summary  then,  noise  covariances  for 
the  LQC  model  should  reflect  the  degree  of 
accuracy  of  the  assumed  model.  Qualitatively 
speaking.  In  Che  modeling  scage,  model 
uncertainty  should  be  traded  for  purely 
stochastic  inputs  which  are  simpler  to  deal 
with  In  che  analysis  and  design. 

In  order  to  validate  the  modeling,  care 
must  be  taken  that  the  expected  deviations 
are  not  coo  large  by  accentuating  the  cost 
of  such  deviations. 
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ABSTRACT 

Modal  reduction  Invariably  Involve*  a  trad*  off 
between  the  available  Information  and  the  simplicity  of 
the  retained  model,  Thla  Information  lo*a  leads  to  an 
uncertainty  about  the  output  of  the  true  system  9lv*n 
the  output  of  the  reduced  model  for  Identical  Inputs. 
In  this  paper  a  novel  approach  to  the  reduction  problem 
Is  given  by  Incorporate  this  Induced  uncertainty  In 
the  reduced  model.  Geometr lcally,  the  Idea  Is  to  con¬ 
struct  a  tub*  centered  on  the  model  output,  to  which 
the  actual  system  ouput  belongs  with  a  high  degree  of 
confidence. 


1 .  DfTRODOCTIOa 

Given  an  Input-output  description  of  a  general 
deterministic  system,  a  realization  can  be  given  In 
state  space  form.  An  essential  feature  (In  fact,  the 
defining  property)  of  the  state  Is  that  It  contain*  all 
the  Information  on*  needs  to  have  about  the  past  Inputs 
to  the  system  In  order  to  predict  Its  output  given  the 
future  Inputs.  The  concept  Is  obviously  equivalent  to 
that  of  a  sufficient  statistic  In  this  case. 

A  deterministic  state  sf *c*  model  evolves  In  Rn 
(or  a  submanifold  of  It),  for  some  Integer  n,  the 
’order*  of  the  system.  A  reduced  model  corresponds 
wl*^.  a  lower  dimensional  (say  m  <  n)  stats  spac*  modal, 
and  It  la  clear  that  this  reduced  state  cannot  contain 
the  full  Information  necessary  to  produce  the  exact 
output.  This  loss  of  Information  entails  an  uncer¬ 
tainty  In  the  output  of  the  reduced  model  regarding 
the  true  model.  To  our  knowledge,  no  existing  model 
reduction  methods,  whether  based  on  Markov  parameter 
matching,  Hankel  matrix  reduction,  moment  matching. 
Pad*  methods,  singular  perturbation  or  balanced  reali¬ 
zations,  etc..  Incorporate  this  Inherent  ’uncertainty* 
due  to  the  ’deletion’  of  a  part  of  the  true  model. 

This  paper  focuses  on  the  reduction  of  an  nth 
order  linear  time  invariant  system  with  dim  u  *  and 
dim  y  •  n0 


based  on  the  Ideas  of  ’balancing*  (2),  as  well  as  some 
information  theoretic  Ideas.  Clearly,  this  uncertainty 
equivalence  principle  can  be  brought  into  the  design  of 
reduced  order  models  for  stochastic  systems.  In  the 
latter  case,  the  stochastic  parameters  In  the  system 
need  to  be  artificially  augmented.  The  next  section 
describes  an  approach  to  open  loop  model  reduction 
based  on  balanced  realizations  which  leads  to  a  bili¬ 
near  stochastic  reduced  order  model. 

2.  A  BILINEAR  STOCHASTIC  MODEL 

Some  model  reduction  methods  (notably  the  ones 
based  on  a  modal  decomposition  and  on  balancing  tech¬ 
niques)  are  based  on  the  ’projection  of  dynamics,*  The 
reduced  model  uses  for  dynamics  the  projection  of  the 
dynamics  of  the  original  system  to  the  subspace  of  the 
part  of  the  state  that  Is  of  interest.  In  balanced 
realizations  this  subspaca  Is  determined  by  Inspection 
of  the  canonical  gramian  (2).  Within  thla  framework 
(2),  let  a  partitioning  of  A,B,C  be  given  and  let 
tr,G,H)  equal  (Aj^Bj.Cj).  Let  the  state  x  also  be 
partitioned  aa  (x * ,  x’)1,  Por  future  reference,  the 
propagation  of  the  first  and  second  order  moments  for  a 
special  bilinear  realization  Is  given.  Its  proof  is 
straightforward. 

Theorem!  Given  the  bilinear  stochastic  realization 
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and  u^  is  deterministic  and  w^  is  a  standard  white 
Gaussian  nola*  sequence  then  the  updates  of  first  and 
second  order  moments  are 
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x(k+i)  -  Ax(k)  ♦  Bu(k)  !  x(0) 
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ntk) 


x (k) x ‘ (k)  +  P(k) 


(7) 


* 

m 

y(k)  -  Cx  (k) 

a 

P(k+1)  -  AP(k) A*  ♦  ww'Tr(nnik)) 

(8) 

• 

The  classical  problem  Is  to  design  a  linear  system 

X 

*  * 

0 

z (k+l )  •  Fz (k)  +  Gu(k)  i  z (0)  •  z.  (2) 

y (k)  -  Cx (k) 

(9) 

r, 

h 


y (k)  •  Hz (k) 

of  dimension  m  <  n,  with  the  objective  that  y  approxi¬ 
mate*  y  In  some  tens*.  As  noted  earlier,  the  model 

(2)  necessarily  results  In  a  lota  of  Information,  but  a 
’user*  of  the  reduced  model  doe*  not  have  knowledge  of 
this  resulting  uncertainty.  A  design  method  where  this 
uncertainty  Is  ’conserved*  by  Introduction  of  artifi¬ 
cial  observation  and  process  noise*  l*  outlined  in 

(3) .  Uncertainty  equivalence  Is  established  through 
equality  of  certain  covariances.  The  approach  It 


Py  (k)  •  CP(k)C’  +W’Tr(nn(k))  (10) 

The  uncertainty  equivalent  modeling  problem  Is  now 
approached  In  several  steps.  First,  the  uncertainty  in 
the  full  order  model,  given  only  a  partial  initial 
atate.  Is  evaluated.  In  Step  2,  an  uncertainty  equiva¬ 
lent,  one-way  decoupled  model  Is  set  up,  leading  to  a 
colored  noise  driven  reduced  order  model.  The  last 
step  entails  the  actual  simplification.  Two  methods 
are  suggested  to  approximate  the  colored  noise  model  by 
a  whits  noise  model. 


V 

V 

V 

V 
N 
s 


97 


AD-N1S4  CSS  APPROX  I  NATIONS  AND  IMPLEMENTATIONS  OF  NONLINEAR 

FILTERING  SCHEMES <U>  GEORGIA  INST  OF  TECH  ATLANTA 
SCHOOL  OF  ELECTRICAL  ENGINEERING  A  H  HADDAD  ET  AL. 
UNCLASSIFIED  FEB  88  AFATL-TR-87-73  F88E33-84-C-S273  F/Q  12/3 


*itAVSm #«*  MJi*  M»«*jK*.K,.tM*Jt>>>i  A  *.|.M  m  « il.Wal .*i*i*t».' 


ilwltncj  for  In 


plat*  Knowledge  of  the  luti 


Aiiuh  that  foe  the  full  otdar  nodal  (i)  In  bal- 
ancad  form,  tha  Initial  (partial)  atata  *2o  *•  unknown. 
By  lack  of  lnfotnatlon  of  any  kind  and  notlvated  by 
tha  fact  that  a  gauaalan  dlatrlbutlon  la  tha  maximum 
entropy  (l.a.,  laaat  prajudlcad)  dlatdbutlon  glvan  tha 
aacond  momenta,  tha  uncactalnty  In  Xj0  la  nodalad  by  a 
gauaalan  random  vector  with  aaro  naan  and  covariance 
xJ(  yet  to  be  apecifled.  Thia  aaaunption  will  cer¬ 
tainly  be  an  improvement  over  merely  aaalgnlng  xero  to 
the  componenta  of  x20.  In  (3)  It  waa  argued,  baaed  on 
atatlattcal  conalderatlona,  that  a  proper  choice  for 
thla  covariance  la 

X2  "  a  X10A1  X1QA2 


“  *  <xio'VA2  <n> 

2.2  Decoupling  of  the  x^  Subayatem 

The  next  approximation  aubatltutea  the  atate  Xj 
by  a  atochaatlc  variable  x2  which  la  gauaalan  with 
covariance  q2(t)  A  at  each  time,  where  now  q2(t)  « 
q2(x.  (t),A.)  la  aa  dlacuaaed  In  atep  1.  From  tha 
partitioned  equation  for  x2>  the  eaaentlal  dynamlca 
(aa  captured  by  the  correlation  function)  of  Xj  are 
governed  by  A 22 and  the  driving  tarma  are  both  u  and 
x,.  Rare,  the  approximation  la  to  let  x,  be  directly 
decoupled  from  x^  and  u,  and  be  modeled  aa  colored 
gauaalan  nolae  (with  dynamlca  corraaponding  to  Aj2) . 
An  ’indirect*  coupling  la  retained  by  letting  the 
covariance  of  x.  correapond  to  x1  In  the  above 
deacrlbed  faahlon.  To  thla  effect,  let 


x2(k»l)  »  AjjXjfk)  ♦  v(k) 


where  v  la  an  aaaumed  Innovation  proceaa  with  ateady 
atate  covariance  determined  from  tha  Lyapunov  equation 

*  2  2 
COV(v)  •  q  A2  -  X^q  AjAJj 


-  q2(*2,  Vi,  ♦  Vi) 


The  latter  equality  followa  from  the  balanced 
realization  propertiea  (21.  The  ’ayatem  (1)  la  then 
approximated  atochaatlcally  by  the  bilinear  reduced 
order  model  driven  by  the  colored  nolae  8 

x(k+1)  -  Anx(k)  ♦  B,u(k)  ♦  A)2«(k)  (14) 


n(k)  •  C^lk)  ♦  C28(k) 


2.3  White  nolae  model 

Rather  than  modeling  the  unknown  dynamlca  by  a 
colored  aequence,  further  almpllf icatlon  reaulta  If  a 
white  nolae  aequence  la  uaed  lnatead.  There  are  two 
waya  In  which  one  can  proceed!  (1)  matching  the  cauae 
and  (2)  matching  the  effect.  Cauaal  matching  la  the 
aimpleat.  White  nolae  aequencea  of  matching  covariance 
are  aubatltuted  wherever  the  colored  aequence  entera. 
Thla  leada  to  tha  lnnovatlona  model 

z (k+1)  •  Anz(k>  ♦  BjU(k)  ♦  q(k)A)2A2/2v(X)  (17) 


n(k)  •  C,x(k)  ♦  qJkJCjAj  v(k)  (IB) 

where  now  v(k)  la  a  atandard  white  Gauaalan  aequence. 
By  Theorem  1  the  atochaatlc  model  update  equatlona  are 
then  eaaily  obtained. 

Mote  that  with  thia  method  a  deviation  of  the 
cover lancea  of  tha  outputa  neceaaarlly  reaulta  between 
the  oolorad  nolae  and  the  white  nolae  model.  In  other 
worda,  the  ‘effecta*  are  changed.  Thla  auggeata  then 
at  once  the  alternative  method  of  matching  the  effect*. 
The  underlying  idea  la  that  the  optimal  (linear)  filter 
for  the  equivalent  white  nolae  model  ahould  not  reault 
in  a  atate  covariance  which  la  leaa  than  the  covariance 
for  the  filter  for  the  colored  nolae  ayatem.  The 
problem  la,  however,  that  the  optimal  filtering  problem 
for  the  colored  model  tnvolvea  the  full  order  atate, 
and  may,  therefore,  be  computationally  prohibitive. 
After  all  we  want  a  reduced  model,  not  an  approximation 
of  the  aame  order  I  Uaing  the  reaulta  of  Jain  |1],  a 
auboptlmal  filter  can  be  dealgned  which  yielde  an  error 
covariance  which  la  guaranteed  to  be  below  a  certain 
bound.  Only  the  covariance  of  the  colored  nolae  needa 
to  be  known.  We  auggeat  the  following  acheme.  Aaaume 
a  white  nolae  model  of  the  form, 

X(k*l)  •  Aux(k)  ♦  B  jU  (k)  ♦  q(k)w(k)  (19) 


n(k)  -  CjZ(k)  ♦  q(k)v(k)  (20) 

The  matrlcea  Q  •  E(ww')  and  R  •  E(vv')  are  determined 
in  auch  a  way  that  the  lnnovatlona  covariance  of  the 
filter  for  (19)-(20)  la  the  aame  aa  for  the  bound  In 

[1)>  Detaila  will  be  explained  In  a  forthcoming  paper. 


® (k+1 )  -  A22«(k)  ♦  q(A2)Aj/2  *2 )(t*k* )  ’ 
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The  propagation  propertiea  for  the  model  follow 
then  from  Theorem  1  uaing  appropriate  matrlcea. 


Remark!  The  approximation  will  be  better  If  the  compo¬ 
nenta  AjjXj  and  C2Xj  are  ’amall*  relative  to,  teapec- 
tively,  A1(x,  *  B.u  and  C,x..  Thla  conatltutea  a  baaia 
for  the  LQG  modeling.  The  Ideaa  are  developed  In  (31. 
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MODEL  REDUCTION  VIA  BALANCING,  AND  CONNECTIONS  WITH  OTHER  METHODS 


Erik  I.  Vemest 

School  of  Electrical  Engineering 
Georgia  Institute  of  Technology 
Atlanta,  Georgia  30332 

ABSTRACT 

This  paper  starts  with  a  rather  philosophical  viewpoint  on  the  concepts  of 
modeling,  model  reduction,  and  randomness.  The  theory  of  open-loop  deterministic 
balancing  is  introduced  as  a  particular  implementation  of  a  model  reduction  scheme.  The 
discussion  focusses  on  the  choice  of  the  criterion.  Thus  motivated,  it  is  shown  that 
similar  ideas  can  be  employed  in  the  reduction  of  optimally  controlled  systems  under  the 
presence  of  noise,  leading  to  the  LQG- balanced  realizations.  This  connects  to  the 
stochastic  balanced  realizations.  Finally,  different  stochastic  realization  algorithms 
are  cast  in  the  common  framework  of  the  RV -coefficient,  and  the  deeper  geometric 
significance  of  this  measure  is  explored. 

I  INTRODUCTION: 

II  Modeling  and  Model  Reduction 

Until  recently,  modeling  has  been  to  a  large  extent  a  heuristic  and  unrigorous 
process  where  ad-hoc  procedures  abounded.  For  this  reason,  further  attention  and  research 
10  problem  has  been  more  than  welcome.  In  effect,  the  first  half  of  the  eighties  has 
*een  *  proliferation  in  modeling  and  model  reduction  methods  which  are  firmly  based  on 
Mathematical  rigor,  (e.g.  [1],  [2],  [3]) 

The  dichotomy  between  modeling  and  model  reduction  is  rather  weak  and  different 
authors  may  provide  different  definitions.  Perhaps  the  most  intuitive  notion  is  to  let 
deling  be  the  process  whereby  an  abstract  mathematical  model  is  matched  to  the  physical 
and  model  reduction  the  process  whereby  a  simpler  mathematical  model  is  derived 
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from  an  existing  mathematical  model.  In  this  regard  modeling  is  what  is  usually  called 
Identification",  while  model  reduction  belongs  to  the  realm  of  Approximation  Theory. 

Keeping  in  mind  that  the  physical  world  and  thus  all  real-life  systems  are  the 
basic  entities,  perhaps  eluding  a  description  as  a  whole,  one  can  only  abstract  some 
aspects  of  its  behavior  and  model  these  properties  in  a  formal  theory.  In  what  follows 
then,  it  will  be  assumed  that  the  "physical  reality"  is  that  what  allows  observation. 
Modeling  thus  infers  a  procedure  which  formalizes  in  a  mathematical  abstraction  some 
aspects  of  the  behavior  of  the  physical  entity.  Gearly  sijcb  a  formalization  cannot  be 
unique. 

Together  with  the  mathematical  abstraction  (model)  one  must  give  its  scope,  i.e. 
which  aspects  of  the  physical  system  it  models.  Models  inherently  have  their  limitations. 
A  linear  small  signal  model  of  a  transistor  for  instance,  no  matter  bow  accurate  its 
parametiization,  will  be  unable  to  predict  the  switching  properties  of  digital  transistor 
circuits.  Clearly  then,  the  scope  of  the  model  should  be  matched  to  whatever  one  expects 
from  the  model.  In  the  study  of  the  kinematics  of  machinery,  there  is  no  need  to  apply 
the  theory  of  relativity,  but  in  the  study  of  particle  accelerators,  the  classical  theory 
no  longer  suffices. 

Once  the  scope  of  the  model  has  been  laid  down,  one  must  determine  the  accuracy 
of  that  model.  How  well  does  it  describe  the  domain-aspect  of  the  physical  reality?  A 
better  model  is  obviously  the  one  that,  given  the  same  domain,  predicts  the  behavior  of 
the  physical  system  more  accurately.  Biped  motion  can  be  crudely  modeled  with  a  ball-and- 
stick  model,  where  for  instance  each  stick  is  rigid,  and  perhaps  A  better  model 

would  be  the  one  incorporating  the  distributed  nature  of  the  masses,  actuators,  etc. 

Also,  one  must  be  able  to  explain  when  a  given  model  is  more  accurate,  or  better 
than  another  one.  More  specifically,  this  consists  in  finding  a  measure  for  the  accuracy, 
or  more  abstractly  a  suitable  topology,  with  a  meaningful  physical  interpretation,  so  that 
the  approximation  problem  for  models,  within  the  «me  scope,  is  well  defined. 


Thirdly,  another  definitely  more  practical  aspect  of  a  formal  theory  is  its 
Roughly  speaking,  complexity  refers  to  the  number  of  ad-hoc  rules 
(postulates)  that  the  theory  requires,  as  well  as  the  smallest  number  of  parameters  that 
need  to  be  specified  a  priori  in  order  to  obtain  uniquely  predictable  (computable)  answers 
within  the  model.  In  a  Newtonian  mechanistic  model  the  whole  universe  would  be 
predictible  given  the  initial  position,  velocity  and  mass  of  each  particle  constituting 
the  universe.  In  this  theory  there  is  one  basic  postulate:  the  (Newtonian)  universal  law 
of  gravity,  but  the  parameter  set  is  ...  well,  very  big  indeed.  Such  a  model  would 

dearly  be  impractical,  if  not  unfeasible,  if  one  were  interested  in  studying  the 
dynamics  of  the  solar  system,  or  the  kinetics  of  a  gas  in  a  containter. 


To  summarize,  every  formal  modeling  theory  should  be  accompanied  by  these  three 


quantifiers: 

-its  domain  of  validity 

-its  predictability  or  accuracy 

-its  complexity 

Hence,  there  exists  only  a  partial  ordering  between  models,  and  blank  statements  as  "Model 
A  is  better  than  model  B."  definitely  do  not  make  any  sense  without  any  indication  of 
these  three  quantifiers.  Even  given  the  quantifiers,  different  models  may  simply  not  be 
comparable.  Whether  one  favors  a  general  model  of  large  scope,  or  several  specialized 
ooes  of  lower  scope  and  complexity  is  now  more  a  matter  of  personal  taste.  Of  course  the 
particular  purpose  or  objective  of  the  model  should  influence  such  a  choice. 

Model  reduction  problems  aim  at  reducing  the  complexity  although  there  generally 
is  a  trade  off  with  the  accuracy  and  the  scope.  Within  the  established  mathematical 
framework,  this  resorts  to  finding  a  more  attractive  subset  of  the  space  which  is  dense 
(with  respect  to  the  topology)  in  the  given  space,  as  for  instance  in  polynomial 
approximation.  Alternatively,  it  may  mean  the  search  for  a  lower  dimensional  subspace  of 
KDX  Pven  *pace.  e.g.  finite  element  approximations  of  distributed  systems.  At  any  rate, 
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the  modelers  dream  is  to  come  up  with  a  mathematical  model,  which  is  suitably  small  in 
order  to  allow  computable  predictions  of  the  reality. 

Model  reduction  can  be  accomplished  in  many  ways:  For  example,  suppose  that  one 
has  a  large  dimensional  system,  perhaps  of  weakly  interacting  subsystems.  Existing 
techniques  find  a  lower  dimensional  model,  e.g.  by  aggregation.  There  is  no  doubt  that 
the  result  will  be  a  simpler  model.  On  the  other  hand,  one  could  take  the  opposite 
approach,  and  let  the  number  of  weakly  interacting  subsystems  approach  infinity,  only  to 
realize  a  statistical  or  probabilistic  description  of  the  system.  Such  a  probabilistic 
description  may  result  in  a  fewer  number  of  parameters  (e.g.  first  and  second  order 
moments).  In  fact,  this  is  exactly  the  approach  of  statistical  dynamics.  Again,  which 
approach  is  favorable  will  depend  on  the  purpose  of  the  model.  If  one  is  only  interested 
in  the  average  behavior  of  the  system,  then  the  statistical  description  may  be 
preferrable.  One  does  not  need  to  know  the  detailed  trajectories  of  the  gasmolecules  in 
order  to  understand  the  workings  of  an  internal  combustion  engine. 

1.2  Stochastic  Models  and  the  Origins  of  Randomness 

In  the  previous  paragraph,  we  already  hinted  at  building  statistical  models. 
The  observed  data  set  on  which  one  tries  to  model  some  behavior,  typically  shows 
fluctuations.  These  fluctuations  arise  from  two  origins: 

i)  Some  variables  (parameters)  of  the  system  may  be  random.  In  this  sense  the  resulting 
probabilities  are  unambiguously  defined,  i.e.  the  randomness  is  imposed  from  the 
"outside”,  (e.g.  random  boundary  conditions). 

ii)  Randomness  can  be  introduced  in  an  arbitrary  way,  to  reflect  our  incomplete  knowledge 
of  an  exact  description  of  a  system.  For  instance  this  can  be  due  to  uncertainties  of  a 
real  probabilistic  nature  (e.g.  quantum  uncertainty).  This  uncertainty  further  arises 
when  the  number  of  variables  is  so  large  that  a  correct  description  would  be  practically 
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impossible.  Randomness  is  then  used  to  replace  a  knowledge  which  is  too  detailed  to  be 
useful  in  practice. 

A  practical  methodology  for  discarding  information  can  then  be  organized  as 


follows: 


Retain  only  a  few  simple  features  which  seem  relevant  to  the  problem,  (e.g. 
based  on  the  different  physical  consequences  that  result  from  the  different  ways 
of  complexity  reduction). 

Give  a  probabilistic  description.  (This  allows  statistical  predictions,  despite 
the  incomplete  information). 

Compute  observed  quantities  from  within  this  model  and  compare  these  with 
experimental  results.  Here  the  "scope"  and  the  "accuracy"  are  tested,  thus 
allowing  "feedback"  or  interaction  in  the  modeling  procedure. 


A  fundamental  assumption  is  the  MARKOV  assumption,  which  is  justifiable  as 


follows: 


The  large  set  of  variables,  giving  an  exact  complete  microscopic  description  of  the  system 
can  be  divided  in  two  classes,  according  to  their  relaxation  times.  If  a  first  set  {x} 
has  relaxation  times,  much  greater  than  all  the  other  variables  in  the  second  class,  then 
the  timescale  of  the  description  (amounting  to  the  scope  of  the  model),  is  chosen 
intermediate  to  the  long  and  short  relaxation  times.  Hence,  all  memory  effects  are 
accounted  for  by  the  variables  {x},  and  it  is  adequate  to  assume  that  they  form  a  Markov 
process. 

Another  frequently  made  assumption  is  that  of  STATIONARITY,  implying  that 

i)  all  external  influences  on  the  system  are  time-independent  on  the  chosen  timescale 

ii)  the  classification  of  all  variables  in  "fast"  and  "slow"  is  preserved  during  the 
evolution  of  the  system. 
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2  OPEN  LOOP  BALANCING 
2.1  Reachability  and  Observability 


A  state  space  model  of  a  continuous  time  (the  theory  for  discrete  time  systems 
is  very  similar  and  omitted)  linear  system  with  n  inputs  and  p  outputs  is  characterized  by 
a  triple  of  matrices  (F,G,H) 


FtW  G€*nxm;  Ht?P“ 


where  □  is  the  order  of  the  system.  In  general,  the  matrices  are  indexed  by  the  reals  1 R. 
For  continuous  time  systems  the  relations  are 


t  (t)  -  F(t)  x(t)  +G(t)  u(t) 


i 

(2-1) 


y(t)  -  H(t)  x(t)  (2.2) 

If  F,  G  and  H  are  invariant  with  time,  it  is  well  known  [4]  that  the 
reachability  and  observability  of  the  system  are  determined  by  the  fullrankness  of  the 
reachability  and  observability  matrices,  respectively  [G,  FG,...,  F^'^G]  and  [H\  FH\... 
pn-lH’].  However,  the  rankdefect  of  a  matrix  is  very  difficult  to  determine  numerically 
because  of  the  finite  precision  arithmetic  of  all  computers.  Moreover,  these  criteria  do 
not  provide  any  means  to  attach  a  measure  of  the  degree  of  observability  or  reachability 
of  the  given  system.  A  quantitative  measure  of  the  reachability  QQ)  or  observability  (Q) 
in  some  interval  (to,  tj)  is  obtained  via  the  (weighted)  Gramian  matrices,  defined  as:  (<!> 
(.,.)  is  the  transition  matrix  of  F) 

t 

*wlt0.  *J  -  J  (T.  to)H’(T)W(T)H(T)d>(T,  to)  dr  (2.3) 
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(2.4) 


/  <Kt,T)G(T)M-l(T)G’(T)<t>*(t,T)dT 

to 

Note  that  these  matrices  are  well  defined  also  in  the  timevarying  case,  as  long  as  the 
integrals  converge.  The  matrices  W(t)  and  M(t)  are  assumed  to  be  (positive  or  negative) 
definite,  (usually  identity).  An  interesting  interpretation  of  these  Gramians  as  weighting 
matrices  for  energies  and  uncertainties  is  given  in  the  following  subsection.  In  fact, 
this  interpretation  forms  the  basis  for  the  model  reduction  algorithms  to  be  introduced  in 
the  next  subsection.  The  last  subsection  then  describes  the  properties  of  the  so-called 
balanced  realizations,  which  were  first  introduced  by  Moore  [5]  for  the  time-invariant 
case. 

2.2  Interpretation  of  the  Gramians. 

2.2.1  Deterministic 

We  stait  with  simple  thought  experiments.  Assume  that  the  relevant  input  and 
output  signals  are  in  1-2-  Let  the  system  be  in  the  state  Xq  initially.  The  output  of  the 
undriven  systems  is 

y(t)  -  H(t)<h(t,  ^  Xq  (2.5) 

In  general,  the  weighted  Lj-norm  is  a  particular  measure  of  the  "strength"  in  the  signal, 
even  though  there  may  not  be  an  underlying  energy  in  a  physical  sense.  The  cross  terms 
measure  the  degree  of  "interference"  between  the  different  components.  Also,  it  is  always 
possible  to  renormal  ire  or  take  linear  combinations  of  the  existing  output  signals  that 
have  a  more  direct  physical  interpretation  in  terms  of  energy.  Equivalently,  one  can 
define  a  weighting  matrix  W  for  the  outputs,  thus  effectively  measuring  the  "energy"  as  a 
weighted  L^-norm.  With  this  generalization,  the  available  W- measured  output-energy  Uw  in 


the  interval  ^  to  tf  for  a  system  in  state  Xq  at  time  ^  is  given  by, 

Uw-  /  y(t)’W(t)y(t)dt 
k> 

-  /x o’  <P’  (t.  t)H’(t)W(t)H(t)<J>(t,  t)Xo  dt  (2.6) 

lo 

■  *o’  Il0.  *f]5Co 

The  generalized  Observability  Gramian  <5>\y  [to,  tf]  is  a  weighting  matrix  for  the  output  L2- 
measure  given  the  initial  state.  If  the  system  is  observable  (i.e.  G?\y  nonsingular),  the 
state  Xq  can  be  recovered  as  (assuming  that  the  system  is  undriven  in  [t0,  tf]) 

*0  *  («?W  Ito.  tf])'1  J  *0,  to)H(t)’W(t)y(t)  dt  (2.7) 

*o 

Consider  now  the  dual  problem  of  determining  the  inputs  which  drive  the  system  from 
the  zero  state  at  ^  to  any  arbitrary  state  Xf  at  tf.  If  the  matrix  1pM(t0,  lf]  ** 
nonsingular,  then  a  particular  input  achieving  this  is 

u(t)  -  M(t)-1  G(t)  <J>(tf,  t)  (*M  [t0,  tf])'1  xf  (2.8) 


The  optimality  properties  of  this  input  are  well-known  [6].  It  is  the  input  with  the 
least  amount  of  "energy",  as  measured  in  a  M-weighted  L2-norm. 


The  corresponding  minimal  energy  is 
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UR  “  *f'  1*0-  ‘fD’1  xf  (210) 

Again  we  see  the  role  played  by  the  M-weighted  Gramian  matrix.  Its  inverse  appears  as  a 
weighting  matrix  for  the  minimal  steering  effort  to  the  state  Xf  from  the  zero  state. 

2.2.2  Stochastic. 

Here  also  we  start  with  two  thought  experiments.  One  characterizing 
"uncertainties"  relating  to  the  inputs  to  the  sytstem,  the  other  one  relating  the  state 
uncertainty  to  the  outputs. 

Let  the  system  be  driven  by  a  white  gaussian  (vector)  input  signal,  of  zero 
mean,  and  covariance  matrix  Q(t).  Assuming  that  this  input  is  uncorrelated  with  the 
initial  state  of  the  system,  the  state  covariance  matrix  P(t)  at  time  t  is  given  by 

P(D  -  «*>(»,  to)  P(t„)  V(t,  ^  +  VQ_1[t0,  t]  (2.11) 

The  first  term  equals  the  covariance  n(t,  to)  for  the  free-running  undisturbed  system. 
The  second  term  is  the  generalized  Q*1 -weighted  Reachability  GTamian  (2.4)  for  M  -  Q_1. 
It  is  a  measure  of  the  uncertainty  induced  in  the  state  by  a  maximally  random  input.  The 
disturbability  of  the  state  (as  measured  by  the  covariance)  in  the  direction  d  by  a  white 
Gaussian  input  is  given  by 

d’P(t)d  -  Tr  DP(t)  (2.12) 

where  D  ”  dd’  and  Tr  is  the  trace  function.  The  expected  value  of  the  A-weigbted  state 
"energy"  in  the  realization  is 

E  x(t)’  A(t)  x(t)  -  Tr  A(t)P(t)  (2.13) 


TrlKt.toJ  +  TrAW^jlto,  t] 


Here  ^Q-i(t0,  t]  appears  as  a  weighting  for  D(t)  and  A(t)  under  the  trace-norm.  The  second 
term  in  (2.13)  is  interpreted  as  the  average  energy  increase  in  the  states  of  the  given 
realization  due  to  the  process  noise  with  covariance  Q(t). 

Finally,  consider  the  state  estimation  problem  for  a  system  with  observation  noise, 
but  no  driving  terms.  If  the  measurement  noise  is  white  with  covan  an  remain*  R(t),  and, 
for  simplicity,  assumed  to  be  uncorrelated  with  the  initial  state  Xq,  then  the  (Kalman 
filter)  solution  to  the  problem  leads  to  the  classical  result  (Sq  »  P(t0|t0)"^) 

Pttolt)'1  “  So  +  ^R.ifto.  t]  (2.14) 

The  matrix  fl?j^_i[to,  t]  is  a  (matrix  valued)  measure  for  the  information  (or  C?p^_i_1[t0.  t] 
for  the  uncertainty)  conveyed  by  the  observation  process  in  (t0,  t)  about  the  initial 
state  Xq,  and  is  usually  referred  to  as  the  "Information  matrix"  in  the  estimation 
literature.  In  particular,  if  there  is  no  prior  information,  P(to)t)'1  is  the  zero 
matrix,  and 

P(tolt)  -  (®R_i  Ito.  t])'1 

The  above  illustrates  in  a  simple  way  how  and  <9  relate  to  (generalized)  energies, 
while  their  inverses  V  and  have  to  do  with  "uncertainties".  The  lower  the  required 
(minimal)  energy  to  reach  a  certain  state  is,  the  more  "reachable"  that  state  is. 
Similarly,  the  higher  the  output  energy  available  from  the  system,  the  more  information  wc 
have  about  that  system,  and  the  smaller  the  errorcovariance  of  the  filtered  initial  state. 
The  Gramians  provide,  therefore,  a  suitable  measure  for  the  degrees  of  reachability  and 
observability  in  a  system. 


With  these  remarks  serving  as  a  motivation,  we  proceed  to  the  formal  definitions. 


2.3  Balanced  Realizations 


2.3.1  The  Canonical  Gramian 


Under  a  similarity  transformation  of  the  state  space  form  of  a  system,  the 
quantitative  reachability  and  observability  properties  of  a  realization  are  changed. 
Indeed  if  T(t)  is  the  (nonsingular)  transformation 

T:  (F.G.HJ-CTFT-^TG.HT-1) 


then  the  Giamians  for  the  new  realization  are 

T:  Hit*  tf]  -  T(t{)  H[to,  tf]  T(t{) 

T:  «?[t0,  tf]  -  T(to)*T  *[t0,  tf]  T-l  ( t o) 

It  has  been  shown  [7]  that  if  the  matrices  F,  G,  H  and  the  weights  W  and  M  are  real 
analytic  functions  of  time,  and  if  the  system  is  completely  reachable  and  observable,  then 
a  similarity  transformation  exists  such  that  both  and  <9  are  diagonal  and  equal.  If  the 
diagonal  elements  are  separated,  then  one  can  define  a  (unique)  canonical  form  by  inducing 
some  ordering  in  these  elements.  In  the  time  invariant  case,  it  is  customary  to  order 
them  according  to  decreasing  magnitude.  In  the  sequel  we  shall  refer  to  these  as  the 
CANONICAL  ELEMENTS,  and  to  the  Gramian^  in  ha|arKv*d  form  as  the  CANONICAL 
GRAMIAN.  The  open  loop  canonical  Gramian  will  be  denoted  as  A.  (In  the  signal  processing 
and  digital  filtering  context,  the  canonical  elements  are  also  known  as  the  second  order 
•nodes  [8]).  The  resulting  realization  (TFT"1,  TG,  HT~*)  is  then  called  "balanced  with 
respect  to  the  weights  W  and  M".  (Usually,  only  balancedness  is  considered  with  respect 
to  the  weights  W«M*I).  Algorithms  for  obtaining  realizations  are  based  on  the 

aingtilar  value  decomposition  of  the  Gramians  in  an  arbitrary  realization  (]5],  [7]). 
Recently  more  direct  methods  have  been  obtained  for  computing  the  balanced  realizations 
for  time  invariant  systems  ([9],  (10],  (11],  [12],  [13]).  The  timevarying  balanced 
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realizations  are  an  extention  of  the  original  balanced  realizations  for  time  invariant 
systems  introduced  by  Moore,  and  were  introduced  in  [7],  In  view  of  the  interpretations 
of  the  Gramians  developed  in  the  previous  section,  it  is  clear  that  in  each  coordinate 
direction  of  the  balanced  state  space  the  degree  of  reachability  and  observability  is  the 


2.3.2  Model  Reduction  via  Balancing 


In  order  to  fix  the  ideas  on  how  this  might  be  used  as  a  criterion  for  model 
reduction,  consider  a  nonminimal  time-invariant  state  space  realization  of  a  system.  It 
is  well  known  that  such  a  realization  is  nonreachable  and/or  unobservable  [4],  A  minimal 
realization,  having  identical  input-output  properties  if  the  system  is  initially  at  rest, 
can  be  obtained  by  removing  these  unobservable  or  nonreachable  modes  from  the  original 
description,  e.g.  via  a  truncation  (projection  of  dynamics)  of  the  standard  decomposition 
of  the  nonminimal  system,  thus  effectively  deleting  the  unreachable  and  noncontrollable 
parts  of  the  state  space.  The  Gramians  give  now  a  quantitative  measure  for  observability 
and  reachability,  rather  than  the  binary  value  assigned  by  the  (Kalman)  aiterium.  If  one 
were  now  to  "delete"  a  component  which  has  a  high  cost  associated  with  its  reachability, 
then  this  component  may  have  very  good  observability  properties,  and  therefore  be  very 
significant  in  the  input-output  description  of  the  system.  Since  in  fact  the  state 
intervenes  only  as  an  interface  between  input  and  output,  a  transformation  (for  instance 
just  a  scaling)  can  be  used  to  yield  a  new  representation  in  which  this  difficult-to- reach 
state  component  has  become  very  easy  to  reach.  The  opposite  would  then  also  be  true  for 
its  observability  properties.  Q early,  component  reachability  and  component  observability 
is  not  an  absolute  criterion  for  the  importance  towards  the  input-output  or  external 
description.  What  one  needs  to  look  for  is  invariants  with  respect  to  arbitrary  state 
space  transformations.  The  product  of  the  reachability  and  observability  Gramians 


transforms  under  T  as  a  similarity.  Hence  the  eigenvalues  of  this  product  are  invariants. 
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But  these  eigenvalues  are  exactly  the  squares  of  the  canonical  elements.  Hence  it  turns 
out  that  the  relative  importance  of  a  (balanced)  state  space  component  with  respect  to  the 
external  system  behavior  is  quantitatively  determined  by  the  "joint  degree  of  reachability 
and  observability"  associated  with  this  system  dimension.  By  virtue  of  the 
interpretations,  we  developed  in  the  previous  section,  this  is  exactly  described  by  the 
elements  of  the  Gramians  of  the  balanced  realization.  Based  on  this  description,  the 
canonical  elements  can  be  used  by  the  system  analyst  or  control  designer  to  decide  which 
components  to  use  in  a  reduced  order  model  for  the  original  system.  Such  a  reduced  order 
model  is  then  obtained  by  "projection  of  dynamics".  One  partitions  the  original  system 
(in  balanced  form)  as 


Gl  1  [Hi  Hd 


F11  f12 
F2i  F22 


where  it  is  assumed  that  the  canonical  elements  are  ordered  with  respect  to  their 
magnitude.  The  reduced  order  model  is  then  obtained  as  (Fn,  G\,  Hi).  It  simply  means 
that  the  components  which  were  difficult  to  control  and  observe  are  considered  as 
completely  uncontrollable  and  unobservable,  and  subsequently,  the  minimal  realization  is 
obtained.  In  reference  to  section  1,  our  topology  is  derived  from  the  trace  of  the 
canonical  Gramian,  under  the  restriction  of  "Projection  of  Dynamics".  The  above 
"projecnon-of-dynamics"  method  is  also  applied  in  the  discrete  case.  However,  it  leads 
to  some  self-inconsistency,  since  the  reduced  models  of  the  discrete  balanced  system  are 
themselves  not  balanced.  Similar  interference-effects  are  also  known  in  realization 
theory.  For  instance,  the  reduction  of  a  Hankel  matrix  via  a  singular  value  decomposition 
does  not  yield  a  matrix  with  Hankel  structure  in  general.  This  has  been  a  steady  source 
of  critique  to  the  method. 
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2.3.3  Properties  of  Balanced  Realizations 


A  basic  property  of  the  Gramians  introduced  in  section  2.1  is  that  they  satisfy  the 
following  Lyapunov  equations  (without  loss  of  generality,  we  take  M  and  W  as  the  identity 
matrix).  (t0  and  tf  fixed): 

*  Ito.  0  “  F(t)*lto.  t]  +  *[*o.  t)F(t)  +  G(t)G(t)’  (2.15a) 

*  [to,  tQ]  -  0  (2.15b) 

0  [t,  tf]  -  -F(t)’0[t,  tf]  -  0[t,  tf]F(t)  -  H*(t)H(t)  (2.16a) 

0  [tf,  tf]  -  0  (2.16b) 

More  general  formulas  for  ^  and  tf  depending  on  t  have  been  obtained  in  [7].  They  are  of 
interest  in  "Sliding  Interval"  Balancing  and  Model  Reduction.  It  follows  that  for  the 
balanced  realization  of  a  time  invariant  realization  with  to  ■  -®  and  tf  “  <*>,  the 
canonical  Gramian  satisfies  the  symmetrical  equations. 

FA  +  AF  +  GG’-O  (2.17) 

FA  +  AF  +  H*H  «  0  (2.18) 

These  equations  form  the  basis  for  the  derivations  of  a  whole  set  of  nice  properties  for 
balanced  realizations  (see  [9],  [11],  and  [14]).  e.g.,  for  some  signature  matrix  E  (a 

diagonal  matrix  having  either  + 1  or  -1  as  diagonal  elements)  one  can  show  that  for  SISO 
systems 


EFE  •  F  ;  EG  «  H'  (2.19) 

3  Balancing  in  the  LQG-sense 

In  the  open  loop  case,  the  balanced  realization  led  to  a  natural  selection  of  reduced 

order  models  through  the  "projection  of  dynamics".  Adopting  this  procedure  for  the  design 
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of  a  reduced  order  controller  is  very  dangerous  however,  due  to  the  feedback  around  the 
system.  A  more  direct  approach  is  needed,  treating  the  closed  loop  as  a  whole.  In  fact, 
the  degree  of  uncertainty  (i.e.,  the  noise  covariances)  and  the  performace  index  or  cost- 
functional  of  the  system  should  be  taken  into  account  for  the  selection  of  a  reduced 
model.  Skelton  et  al.  (15-17]  suggested  a  weighting  with  respect  to  the  "component- 
costs”.  In  a  stochastic  context,  this  may  be  undesirable,  since  it  may  lead  to  "bad 
surprises".  Indeed,  if  the  uncertainty  associated  with  a  dynamical  element,  with  small 
expected  cost  contribution,  is  high,  then  the  actual  cost  contribution  for  a  sample 
trajectory  of  the  stochastic  system  may  be  quite  different  from  its  (lower)  expectation. 
This  motivates  the  balancing  with  respect  to  the  optimal  deterministic  controller,  and  the 
stochastic  observer  via  the  separation  principle.  ([18-19],  [20]) 

The  basics  of  the  LQG-theory  are  well  established,  and  can  be  found  in  many 
textbooks.  The  solution  to  the  optimal  control  problem  for  a  linear  system  with  a 
quadratic  performance  index  in  the  presence  of  white  gaussian  noise  falls  a  pan  into  the 
design  of  the  deterministic  controller  (i.e.  assuming  perfect  knowledge  of  the  state  of  a 
system),  and  a  stochastic  observer  for  the  noisy  system  driven  by  an  external  (but  assumed 
known)  input.  This  constitutes  the  celebrated  Certainty-Equivalence  Principle  [6].  We 
shall  briefly  summarize  the  solution  for  this  stochastic  control  problem.  As  was  done  in 
the  open  loop  case,  here  also  we  shall  try  to  give  an  interpretation  to  the  solution,  and 
clarify  the  different  components  in  it.  Several  different  problems  are  now  of  interest, 
la  digital  control  (using  fixed  point  arithmetic)  the  interest  is  in  minimal  sensitivity 
(with  respect  to  the  finite  wordlength  effects)  design  of  the  digital  controller.  In 
general  control,  one  might  be  interested  in  a  suboptima]  but  reduced  order  controller  (in 
order  to  reduce  the  computational  burden).  Finally  one  might  just  be  concerned  with  the 
modeling  and  analysis  of  the  overall  feedback  system  (thus  including  the  plant)  for  the 
PurP°sc  of  assessing  the  dominant  contributions  to  the  performance  index,  or  uncertainty, 
la  this  case  reduced  order  models  for  the  combined  plant  and  regulator  are  of  interest. 


It  will  be  shown  that  the  ideas  of  balanced  realizations,  when  properly  (re)deuned  are 
again  very  usefull.  The  LQG- balancing  for  continuous  time  systems  will  be  motivated  from 
the  following  analysis. 


3.1  The  LQG -terminal  controller 

In  order  to  fix  the  ideas,  consider  the  stochastic  sytem 


*  -Fx+Gu  +  w  ;  dim  x  -  n,  dim  u  ■  m 


y  -  H  x  +  v 


;  dim  y  -  p 


with  the  initial  state  normally  distributed: 


x(t0)  -  N(Xo,P0) 


For  simplicity  (but  without  loss  of  generality)  we  shall  assume  that  w  and  v  are 
uncorrelated  zero  mean  white  gaussian  noise  processes,  with  covariances  Q  and  R 
respectively.  They  are  further  assumed  to  be  independent  from  the  initial  conditions. 
Let  the  design  objective  be  the  minimization  of  a  positive  semi-definite  quadratic 
performance  index 

<f 

J  -  E  (  x’(tf)  Sf  x(tf)  +  f  (x’Ax  +  u’Bu)dt  )  (3.4) 

<o 

It  is  assumed  that  the  matrices  R  and  B  are  positive  definite,  but  otherwise  arbitrary. 
In  fact,  this  amounts  to  a  slight  overparametrization,  but  avoids  some  preliminary 
transformation.  First,  assuming  that  the  states  can  be  perfectly  measured,  the 
(deterministic)  optimal  dosed  loop  system  will  have  dynamics: 


i  -  (F-GC)x 
C-  B-l  G’S 


*(*o)  “  * 


$  -  -S(F-GC)  -  (F-GC)’  S  -A-CBC 


S(tf)  -  Sf 
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Tbe  solution  S(t;  tf ,  Sf)  of  (3.7)  has  an  interpretation  as  a  weighting  matrix  for  the 
minimum  "cost-to-go"  from  the  state  x(t)  at  time  t.  For  Sf  -  0,  S(t)  is  exactly  the 
observability  Gramian  of  the  dosed  loop  system  sporting  the  factious  (n+  m)-dimensional 


output 


:  -  Lx 


L*  -  [  -CBl/2,  A^j 


The  performance  index  then  is  the  output  energy  (as  discussed  in  section  2.2)  of  this 
system.  The  presence  of  the  nonzero  Sf  can  be  interpreted  as  the  instantaneous  release 
("flushing")  of  the  remaining  "energy”  in  the  system  (due  to  a  nonzero  state)  at  time  tf 
over  a  weight  matrix  Sf.  Equivalently,  it  is  also  a  measure  for  the  amount  of 
information,  about  an  a  priori  unknown  initial  condition,  that  this  Gctious  output  would 
cany,  if  corrupted  by  unit  variance  white  gaussian  noise. 


Similarly,  the  filter  error  dynamics  are  given  by 


(F-KH)x  +  Mw  ;dimw*n+p 


M  -  [Q^2,  KR^2] 


where 


X  -  X  -  X 


K-  PH*R-1 


P  -  (F-KH)P  +  P(F-KH)’  +  Q  +  KJRK’  P(to)  -  P0 


(3.10) 

(3.11) 


(3-12) 

(3.13) 

(3.14) 


and  w(t)  is  a  white  noise  of  unit  variance.  Again,  P(t;  to,  PQ)  is  a  measure  for  the 
uncertainty  in  the  dosed  loop  system  (3.10),  and  characterizes  the  "disturbability"  by 
Ok  noise.  It  is  in  fact  the  covariance  E (a')  of  the  estimation  error  at  time  t,  if  at 
time  ^  the  error  was  PQ. 

The  Certainty  Equivalence  prindple  states  that  the  optimal  control  for  the  system 
(3.1  -  3.2)  where  the  states  are  not  perfectly  known,  is  given  by  feedback  of  the 
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estimates  of  the  states  over  the  optimal  control-gams.  The  overall  equations  are  thus 


u--Cx  (3.15) 

t  *  F1+  Gu  +  K(y-Hi)  ;  *<,  -  0  (3.16) 

(The  variance  of  the  estimate  E(&’)  will  be  denoted  by  1,  while  H  is  used  for  the  state 
variance  E  (xx').  By  the  optimality  of  the  estimates,  we  have  then  n  «  I  +  P.  Note  that 
the  innovation  e  ■  (y-Hl)  acts  as  a  white  Gaussian  noise  with  covariance  R.  The 
following  equivalent  equations  are  easily  derived. 


(F-GC)i+GCi  +  w 

(3.17) 

(F-GC)I  +  KHx  +  Kv 

(318) 

(F-GC-KH)!  +  Ky 

(319) 

The  optional  performance  index  (3.4)  can  now  be  evaluated  in  several  different  forms 
(using  partial  integration  and  the  Riccati  equations  for  S  and  P  combined  with  the 
Lyapunov  equations  for  II  and  £  in  the  dosed  loop) 

H 

J  -  Tr  (nfSf  +  J  (An  +  CBCZ)  dt}  (3.20) 

*o 

tf 

-  Tr  (rioSo  +  J  (SQ  +  CBCP)  dt}  (3.21) 

to 

tf 

-  Tr  (PfSf  +  J  (AP  +  KRK’S)  dt}  (3.22) 

to 

Several  (equivalent)  interpretations  follow  from  these  equations.  (3.19)  and  (3.15) 
give  an  open  loop  representation  for  the  optimal  stochastic  controller,  with  input  y  and 
output  the  control  u  (figure  1).  The  equations  (3.10)  and  (3.17)  lead  to  a  decomposition 
as  a  cascade  of  a  system  (F-KH,  M,  I),  driven  by  standard  white  gaussian  noise,  connected 

via  a  ’'transmission'  matrix  G€  to  the  system  (F-GC,  I,  L),  (figure  2).  Whereas  the  former 
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subsystem  represents  the  dynamics  of  the  estimation  error  x  (driven  by  the  fictitious 
noise  u>  for  which  Mm  =  w-Kv),  the  latter  will  represent  the  plant  states  x,  if  and  only  if 
an  additional  gaussian  input  (w)  is  summed  at  its  input.  This  additional  noise  has 
covariance  Q,  and  is  correlated  with  the  noise  u  according  to  M  E(ww’)  =  Q. 

In  terms  of  this  decomposition  we  define  a  fictitious  output  z  ■  +  Z2  where  z\  is 

the  (n+m)-dimensional  output  from  the  second  subsystem  and  Z2  *  Tx  where  T  is  the  (n+m) 
by  n  matrix  P  -  [  C’B^,  0  ]  and  L  is  as  defined  in  (3.9).  Because  the  outputs  z,  and 
z,  are  "maximally  interfering"  (due  to  their  correlation),  the  variance  of  their  sum  z,  is 
actually  the  difference  of  their  individual  variances,  which  is  the  integrand  in  (3.20). 

Another  interesting  representation  (figure  3)  can  be  derived  starting  from  the 
equations  (3.10)  and  (3.18).  Again  a  cascade  is  formed,  beginning  with  the  system  (F-KH, 
M,  I)  driven  by  standard  white  gaussian  noise.  This  time  the  output  (which  is  the 
representation  of  the  state  estimation  error)  drives,  via  the  "transmission-matrix"  KH, 
the  system  (F-GC,  I,  L).  This  system  will  have  the  variable  %  as  state,  if  again  an 
additional  "correction"  input  Kv  (with  variance  KRK’)  is  added,  having  a  correlation  T 
with  w  satisfying: 

MTK’  -  E(Mw)  (Kv)’  -  -KRK. 

A  fictitious  output  z«z 3  +  Z4  is  defined  which  generates  the  integrand  in  the  performance 
index.  In  this  case,  it  is  readily  verified  that  this  is  accomplished  by  Z3,  the  output 
of  the  cascade  and  Z4  *  N  x  ,  where  N’  -  (0,  A^).  Note  that  in  this  decomposition  the 
input  to  the  x-subsystem  is  actually  K  times  the  innovations  process.  This  is  known  to 
act  as  a  white  noise.  Here  Z3  and  Z4  are  noninterfering  (uncorrelated). 

3.2  Interpretation  of  the  Cost  Functional. 

The  two  decompositions  described  in  the  last  paragraph,  lead  to  a  "cost-decoupled" 
interpretation  of  the  various  terms.  For  simplicity,  we  shall  fix  the  ideas  on  the 
LQG-regulator  problem.  The  matrices  F,  G,  H,  Q,  and  R  are  all  supposed  to  be  time- 
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invariant,  and  it  will  be  assumed  that  the  statistical  stationary  state  exists.  For  the 
steady-state  regulator  problem,  the  cost-rate,  rather  than  the  cost  (which  is  infinite)  is 
computed.  Let  in  the  LQG  terminal  controller,  tQ  and  tf  approach  -®  and  +•  respectively. 
Then  the  cost  rate  is  the  limit  of  the  expected  cost  per  time  unit.  It  follows  then  from 
the  equations  (3.20  -  3.22)  that  the  cost  rate  is 

j-  Tr{An  +  CBCI}  -  Tr  {SQ+CBCP}  -  Tr  {AP+  KRK’S}  (3.23) 

where  now  P  and  S  satisfy  the  algebraic  Riccati  equations,  respectively 

(F-KH)  P  +  P  (F-KH)’  +  MM’  -  0  (3.24) 

(F-GC)'S  +  S  (F-GC)  +  L’L  -  0  (3.25) 

As  already  discussed,  the  following  identifications  follow,  where  the  entry  is 
irrelevant: 

P  is  the  REACHABILITY  gramian  for  the  system  (M,  F-KH,  *) 

S  is  the  OBSERVABILITY  gramian  of  the  system  (•,  F-GC,  L) 

Consider  now  a  fictitious  system,  consisting  of  the  two  decoupled  subsystems  (F-GC, 
Qm,  L)  and  (F-KH,  M,  r).  If  both  are  driven  by  independent  standard  white  gauss  an 
noise,  then  their  outputs  (respectively  i\  and  Z2)  will  be  uncorrelated  as  well,  and  their 
expected  "powers"  additive.  The  contribution  of  the  first  is  exactly  Tr  {Q  0  (F-GC,  L)}  “ 
Tr{QS},  while  the  contribution  of  the  second  is  Tr{MM’fl?  (F-KH,  T)}  -  TrflT’VOF-KH,  M)}  ■ 
Tr{CBCP}.  We  used  the  fact  that  in  the  steady  state,  the  A-weigbted  output  power  of  a 
system  (F,  G,  H)  driven  by  a  zero  mean  white  gaussian  input  of  variance  Q,  can  be 
expressed  as 

Tr  H’AH^q.!  -  Tr  GQG'0A  (3.26) 
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But  note  that  this  is  exactly  the  system  of  figure  2  if  one  replaced  the  input  noises 
by  uncorrelated  ones,  and  set  the  transmission  matrix  G C  to  zero.  The  two  partial 
outputs  are  then  indeed  noninterfering.  This  leads  to  the  INTERPRETATION  that  Tr  {QS}  is 
the  partial  cost  rate  in  the  deterministic  dosed  loop  system  due  to  the  process  noise  u 
ooly,  i.e.,  assuming  full  knowledge  of  the  state  x.  The  part  Tr{CBCP}  is  the  cost  rate 
due  to  the  state  estimation  error.  It  is  as  if  the  role  of  the  transmission  matrix  GC  and 
the  input  correlation  is  to  guarantee  maximal  destructive  interference  between  the 
outputs,  (so  that  the  variance  of  the  output  is  the  difference  of  the  individual 
variances). 

Similarly,  with  regards  to  the  figure  3,  we  have  the  estimator  subsystem  (F-GC,  K,  L) 
driven  by  the  innovations  «,  and  with  output  Z3,  and  the  estimation-error  system  (F-KH,  M, 
N)  driven  by  u  and  with  output  Z4.  The  two  outputs  are  uncorrelated,  and  their  sum  z  “  Z3 
+  Z4  has  covariance  equal  to  the  cost  rate  of  the  optimally  controlled  system.  A  cost- 
equivalent  decoupled  form  can  be  constructed,  consisting  of  the  system  (F-GC,  K,  L)  (the 
estimator  system)  driven  by  v,  considered  independent  of  the  noise  o>,  which  drives  the 
error  system  (M,  F-KH,  N).  Their  output  "power"  contributions  are  respectively  Tr{KRK’S) 
and  Tr  {APJ.  Thus  the  role  of  the  transmission  matrix  KH  in  figure  3  seems  to  be  to 
effectively  uncorrelate  the  two  input  noises. 

The  contribution  TRfKRK’S}  can  thus  be  identified  as  the  cost  rate  for  the  closed 
loop  system  under  the  assumption  that  the  estimated  state  is  the  correct  state.  However 
since  not  1,  but  x  is  the  state  of  the  closed  loop  system,  a  correction  occurs  due  to  the 
imperfect  knowledge  of  the  state  (i.e.  x).  This  is  represented  by  the  (independent) 
contribution  of  the  error  subsystem  (M,  F-KH,  N),  driven  by  the  equivalent  noise  u>,  with 
cost-rate  contribution  Tr{AP). 


122 


u>, 


1 


3.3  LOG -Balanced  Realizations 

Since  S  and  P  transform  under  a  sunilanty  transformation  T  as  T~T  and  ST-^  and  TFT 
respectively,  it  u  possible  to  transform  any  given  realisation  such  that  in  the  new 
coordinate  sy.  m, 


P  -  S  -  n 


where  fl  is  diagonal,  with  its  elements  ordered  in  magnitude.  Cl  is  called  the  CANONICAL 
RICCATLAN.  The  new  realization  will  then  be  referred  to  as  the  LOG -BALANCED  realization 
Note  that  Q  as  well  as  A  also  transform  under  the  similarity.  (B  and  R  are  of  course 
invariant  as  they  relate  to  "external "  variables). 

The  cost  rate  for  the  optimally  regulated  system  is  then  in  the  balanced  coordinates, 


j  -  Tr  n(Q+CBC)  -  Tr  ft(A-t-KRK') 


or,  using  the  fact  that  Cl  is  diagonal: 


j  -  2^(Qii+«u2(GB-lG%) 
i-1 


j  -  2  ni(Aa+nii2(HR-iH*)u> 

i-1 


(3.27) 


(3-28) 


(3.29) 


It  is  dear  that  the  cost  rates  corresponding  to  the  individual  state  components  are  not 
simply  determined  by  the  magnitudes  of  the  elements  of  the  canonical  Riccatian,  but  also 
depend  on  the  relative  magnitudes  of  the  diagonal  elements  of  Q,  GB'^G’,  or  A  and  H’R_1H. 

If  the  system  (in  balanced  coordinates)  is  partitioned  into  two  coupled  subsystems 
with  ftj  >  O2,  then  a  sufficient  condition  for  the  part  corresponding  with  flj  to  have  the 
dominant  cost  rate  contribution,  is  that  either  of  the  followyng  sets  of  inequalities  are 
satisfied:  (the  indices  refer  to  the  block-entries') 


WWW 


'  Hi  >  ^2 

•  Qn  >  Q22  (3.30) 

,  (GB-k3’)n  >  (GB-lG,)22 

fii  >  n2 

'  A11  a22  (3-31) 

.  (H’R-lH)n  >  (H'R-lH)22 

The  various  terms  can  be  interpreted  as  quantifying  the  following: 

Q  :  Disturbance  (noise)  in  the  plant. 

GB*^G’  "Potential"  of  the  system  input  to  decrease  the  regulation  cost. 

A  :  Cost  on  the  state  deviations. 

H'R'lH  :  Information  (about  the  state)  gained  from  the  measurements  (observations). 

The  first  set  of  inequalities  expresses  that  the  set  of  variates  that  are  most 
disturbed  by  noise  and  for  which  at  the  same  time  the  input-potential  is  high,  are 
dominant.  (The  larger  the  input-potential,  the  less  the  cost  of  control).  Alternatively, 
state  variables  for  which  the  information  contained  in  the  measurements  and  the  state-cost 
is  highest,  also  contribute  to  the  major  parts  in  the  regulation  cost-rate. 

If  one  were  only  interested  in  obtaining  a  simple  model  for  the  optimally  regulated 
system,  for  instance  with  the  goal  of  identifying  the  dominant  contributions  to  the 
uncertainties  and  the  costs,  then  the  combined  plant  and  regulator  may  be  reduced  by 
"projection  of  dynamics".  The  decision  on  the  order  of  the  reduced  model  can  for  instance 
be  based  on  tresholding  the  ratios 

“(r)  -  «°d  pm 

K0  -  £  ^(Qii+nii^GB-lG’Jii) 


where 


If  oq  the  ocher  hand  one  wants  to  design  a  reduced  order  regulator  for  a  fixed  plant, 
the  above  cannot  be  taken  over  directly.  It  has  been  shown  that  the  design  of  a  reduced 
order  regulator  based  on  a  reduced  order  model  of  the  plant  may  be  unsatisfactory.  Also, 
a  'projection  of  dynamics’1  approach  on  the  full  order  combined  plant  and  regulator,  based 
on  the  magnitude  of  the  elements  of  the  canonical  Riccatian  alone  may  not  guarantee  the 
stability  of  the  regulation  of  the  (full  order)  plant  with  the  obtained  reduced  regulator 
([16-17],  [20]).  Also  in  the  open  loop  case,  the  A  provides  insufficient  information 

m,  [2i]). 

In  reference  to  the  decompositions  in  figures  2  and  3,  the  following  property  of  the 
transmission  matrix  is  derived. 

Theorem  The  lower  (upper)  triangular  part  of  the  transmission  matrix  GC  (KH)  is  dominant 
in  the  balanced  coordinates. 

proof:  Since  T  “  GC  -  GB^G'fl,  it  follows  that  flT  is  symmetric.  But  then  ftjTjj  “  Tjjflj 
for  all  i  and  j.  Hence,  Tj,  -  Tjj  Oj/flj.  Since  by  assumption  the  elements  ftj  are 
ordered,  we  get  T^  >_  Tjj  whenever  j>i.  Similarly,  X  «  KH  -  flH’R'^H  and  thus  Xfl  is 
symmetric,  from  which  Xjj  <_ Xjj  whenever  j>i. 

Consider  figure  2.  Keeping  the  ordering  in  mind  for  the  balanced  case,  it  follows 
that  low  uncertainty  states  are  more  perturbed  by  the  high  uncertainty  states  than  vice 
versa.  It  further  follows  from  the  positive  semi-definiteness  of  the  "input-potential" 
that  for  j>i 

(GB-lG*)ij2  <  (GB-lG’)ii  (GB'lG’ty 

and  thus  that  the  elements  in  the  upper  left  block  of  (GB"^G’)ft  are  larger  in  magnitude 
tkao  the  elements  in  the  upper  right  block  of  T.  Hence,  12  i*  almost  decoupled  from  the 
dcsed  loop  system  (I,  F-GC,  L).  In  fact,  for  the  same  reason  the  upper  right  block  of 
the  dosed  loop  system  matrix  F-GC  will  be  dose  to  that  of  F  itself,  so  that  there  is 


almost  no  feedback  from  the  x2  subsystem.  One  expects,  therefore,  that  the  dosed  loop 
dynamics  of  the  plant  controlled  with  the  reduced  regulator,  would  have  a  near  optimal 
behavior.  Similar  arguments  work  with  figure  3. 


4  STOCHASTIC  MODELING 
4.1  The  Stochastic  Realization  Problem 

Desai  and  Pal  [22-23],  extended  the  ideas  of  balancing  in  the  LQG-sense  to  the 
stochastic  realization  problem.  Balandng  is  here  with  respect  to  the  state  covariance 
matrices  in  the  forward  and  the  backward  innovation  rt presentations.  These  matrices  solve 
dual  Riccati  equations.  The  elements  of  the  "canonical  Riccatian"  are  connected  to  the 
canonical  correlations  between  the  past  and  the  future  observations.  Arun  and  Kung  [24] 
contrasted  the  method  based  on  canonical  correlations  with  a  method  based  on  Principal 
Components.  Vaccaro  showed  its  connection  with  deterministic  open  loop  balancing  [25]. 
Ramos  and  Verriest  [26-27]  unified  the  theory  be  showing  that  both  the  canonical 
correlation  analysis  (CCA)  and  the  principal  component  analysis  (PCA)  are  special  cases  of 
a  more  general  optimization  problem,  using  a  new  tool  from  multivariate  statistics:  the 
RV-coeffident  introduced  by  Hscouffier  [28].  If  two  zero  mean  random  vectors  Xj  and  X2 
(not  necessarily  of  the  same  dimension)  have  covariance  matrix 


cov(X1,X2)  - 


*11  *12 
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then  the  RV-coeffident  is  defined  as 


RV(X,Y)  - 


wW) 


This  measure  shares  many  of  the  properties  of  a  correlation  coeffident,  but  is  not  one 
itself.  (It  is  the  square  of  the  correlation  if  X  and  Y  are  scalar).  It  also  allows  the 


computation  of  a  "figure  of  merit”  for  each  algorithm  in  a  consistent  way. 

This  formalism  is  applied  to  stochastic  realization  theory  as  follows.  Given  the 
correlation  sequence  {Ak}  of  a  discrete  time  stationary  stochastic  sequence  {yjJ,  the 
forward  and  backward  predictor  subspaces  are 

Xk-Span(Yk+  |  Yk-)  (4.3) 

Zfc-1  -  Span(Yk-  |  Yk+) 

where  X+  and  Y~  respectively  correspond  to  the  "future"  and  the  "past"  of  the  process. 
Here  (AfB)  denotes  the  projection  of  span(A)  onto  span(B).  These  two  spaces  form  the 
information  interface  between  the  past  and  the  future.  Defining  as  usually 

Hk  “  E{Yk+(Yk-)*}  ,  Rk+  -  E{Yk+(Yk+)’}  ,  Rk‘  -  E{Yk-(Yk-)’}  (4.4) 

then  CCA  is  equivalent  to  the  problem  of  finding  transformations  L  and  M  such  that  RV 
(L’Y+,  M’Y+)  is  maximized,  subject  to  the  constraints  that  L’(R+)L  and  M’(R')M  are 
diagonal.  The  PCA  is  equivalent  to  the  problem  of  maximizing  RV  (Y+ ,  M’X-)  over  M  under 
the  constraint  M’(R")M  diagonal.  The  two  methods  are  also  referred  to  as  the  one-sided 
and  the  two-sided  stochastic  realization  problem. 

4.2  Geometrical  Interpretation:  Correlation  between  Subspaces 

Let  H  be  a  Hilbert  space.  The  set  of  all  closed  subspaces  of  H  has  the  structure  of 
10  orthocomplemented  complete  lattice,  also  called  a  logic.  The  lattice  of  all  closed 
ubspaces  of  H  and  the  lattice  Proj  H  of  all  orthoprojectors  on  H  are  isomorphic. 

In  his  study  of  the  mathematical  foundations  of  quantum  mechanics,  Mackey  posed  the 
problem  of  finding  all  positive  measures  on  the  dosed  subspaces  of  a  Hilbert  space.  Such 
*  measure  must  have  the  property  that  for  any  countable  collection  {S;}  of  mutually 


orthogonal  closed  subspaces  the  mapping  is  o-additive,  i.e., 


1  p.(Sj)  -  *(2  Si)  (4.5) 

i  i 

A  measure  satisfying  the  above  property  is  for  instance  obtained  by  selecting  a  vector  v 
in  the  Hilbert  space  H,  and  letting  for  each  subspace  A  of  H 

f*>v(A)  -  p>A<v)02  (4.6) 

where  PA  is  the  projection  operation  on  A.  Q early,  finite  convex  combinations  of  such 
measures  also  satisfy  the  conditions  for  such  measures,  and  passing  to  the  limit,  any 
positive  semidefinite  trace  class  operator  T  also  defines  such  a  measure  via 

*(A)  -  Tr<TpA)  (4.7) 


Gleason  [29]  has  shown  that  is  a  separable  Hilbert  space  of  dimension  at  least  three, 
every  measure  on  the  dosed  subspaces  can  be  represented  as  above,  with  T  a  positive 
definite  operator  of  trace  class. 

Consider  now  a  tensor  product  Hilber  space  ®  H,  and  let  {¥,}  be  a  complete 
Orthonormal  Set  (CONS)  in  H.  Any  vector  x  in  this  tensor  product  space  has  then  a 
decomposition 


x-2  <^1  ’»  xi  «  R**  for  all  i  (4.8) 

i 

The  vector  x  will  be  referred  to  as  a  "prior".  Let  for  all  A  in  Proj  :  M-i(A)  - 
|PA(xj)(f2  and  define  a  "superposition  of  measures"  on  Proj  R**  as  nx  ■*  2  P-i  °i  501116 


square  summable  positive  weights  {aj}.  By  Gleason’s  theorem  [29]  it  follows  that  there 
exists  an  operator  Tx  :  RP  -  RP  such  that 

M-x(A)  -  Tr  Tx  PA  (4.9) 

This  operator  is  characteristic  for  the  given  vector  x  in  R**  <£)  H  (in  fact,  a  "sufficient 
statistic"),  and  one  can  think  of  T  (or  p.)  as  "conditioned*  by  the  vector  x.  Since 

Tx  ■  2  ®i2lxi>  <xi!  “  **  (41°) 

i 

it  can  be  interpreted  as  a  Gramian  or  covariance  operator. 

The  measure  jx^A)  gives  a  numeric  value  to  the  closeness  of  A  to  R8,  given  the  prior 
x  [30].  Define  also  the  extended  projectors  PB  e  Proj  (R8  <x>  H)  by 

PB(x)-  2“iPBK><^il  (411) 

i 

They  allow  now  the  definition  of  the  "variance"  and  "covariance"  of  subspaces  in  Proj  R8, 
[31].  The  "posterior"  variance  of  A  «  Proj  R8  given  x  is  then  the  operator  from  R8  to  R8 

(PAx)  (PAX)>  -  2  PAtxj>  <xj[pA  -  pAjxpA  (4.12) 

i 

and  the  covariance 

(P8!)  (?AX)>  .  2  P®^  <Xj[pA  .  pBrxpA  (4.13) 

i 

This  is  simply  interpreted  as  the  restriction  to  B  of  the  mapping  Tx  restricted  to  the 
subspace  A,  and  displays  the  coupling  or  interface  between  A  and  B  given  x.  In  order  to 
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Abstract 


This  paper  starts  with  a  discussion  on  the  Projec¬ 
tion  of  Dynamics — principle  as  a  modal  reduction  tool. 
This  is  exploited  In  modal  approximations  and  reduc¬ 
tions  based  on  open  loop  balancing.  Similar  Ideas  can 
be  employed  for  the  reduction  of  the  closed  loop, 
terminal  controller  problem,  In  the  presence  of  noise, 
leading  to  the  LQG  balanced  realizations.  An  Interpre¬ 
tation  of  the  various  terms  In  the  optimal  performance 
index  leads  to  the  motivation  for  this  reduction 
technique.  It  Is  shown  that  aevsral  'natural*  restric¬ 
tions  occur,  constraining  the  faithfulness  of  this 
rsductlon  technique.  As  In  the  open  loop  case.  It 
Illustrates  the  Insufficiency  of  the  canonical  ricca- 
tlon  towards  the  determination  of  the  reduced  order 
models. 


1,  Introduction 

The  theory  of  balanced  realizations,  Introduced  by 
Moore  (5|,  has  proven  to  be  very  useful  In  the  model 
reduction  problem  for  linear  systems.  The  method 
relies  on  a  transformation  of  the  realization  of  a 
given  system  to  the  balanced  coordinates.  In  this 
basis,  the  reachability  and  observability  gramlan  of 
the  system  are  equal  and  diagonal.  These  canonical 
elements  (also  termed  second  order  modes)  csn  be 
Interpreted  In  terms  of  energies  or  uncertainties  (101 
and  display  the  relative  Importance  of  the  (balanced) 
state  components  with  respect  to  the  external  system 
behavior,  as  the  'joint  degree  of  reachability  and 
observability*  associated  with  this  system  dimension. 
The  reduced  model  Is  then  obtained  by  deleting  the 
system  dimensions  corresponding  to  the  smaller  canon¬ 
ical  elements,  l.e..  Projection  of  Dynamics.  It  has 
bean  shown  by  Kabamba  (3]  that  the  above  sketched 
procedure  does  not  yield  faithful  reduced  order  models 
In  some  cases.  The  reason  hereto  Is  that  the  canonical 
gramlan  only  haa  n  'free*  parameters,  wheress  the  n-th 
order  model  requires  2n  parameters  (we  shall  reatrlct 
our  discussion  to  the  Single  Input  Single  Output  case). 
Hence,  the  decision  on  the  reduced  order  model  Is  only 
based  on  half  of  the  parameter  set. 

The  open  loop  balancing  methods  have  been  extended 
to  the  closed  loop  LQG  problem  by  this  author  (8,9)  and 
Jonkhaere  and  Silverman  (2)  for  the  LQG  regulator.  The 
performance  of  the  LQG  reduced  system  was  also  shown  to 
depend  on  other  variables,  not  directly  related  to  the 
elements  of  the  'canonical  Riccatlan.* 

This  paper  will  motivate  the  use  of  and  determine 
a  'complete*  set  of  parameters,  which  suffice  as  an 
Information  base  to  determine  reduced  order  models. 
The  more  general  terminal  controller  problem  will  be 
dtacussed.  The  DQC  regulator  problem  will  then  follow 
as  a  special  case  (infinite  final  time,  and  time  Invar¬ 
iant  parameters  of  system,  noise,  and  performance 
Index).  Section  2  describes  briefly  the  method  of 
Projection  of  Dynamics.  In  Section  3,  the  LQG  balanc¬ 
ing  Is  discussed,  and  an  interpretation  of  the  various 
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terms  in  the  optimal  performance  Index  Is  given. 
This  than  leads  to  a  set  of  criteria  to  determine  the 
reduced  order  models. 

2.  Projection  of  Dynamics 

By  Projection  of  Dynamics  (POD)  of  a  system  (A,  B, 
C),  we  understand  the  reduction  of  the  given  system  (A, 
B,  C)  to  a  smaller  aystem,  which  has  as  Its  dynamics 
the  projection  of  the  original  dynamics  of  the  system, 
confined  to  the  retained  state  variables,  without  loss 
of  generality,  one  can  thue  assume  that  the  POD  is 
acting  on  the  (1,1)  block  system  obtained  from  parti¬ 
tioning  the  given  aystem  Into 


|C«'C2l 


thus  yielding  the  subsystem  (At  t ,B} ,C, ) . 

The  POD  technique  arises,  for  Instance,  In  the 
model  reduction  techniques  based  on  model  approxima¬ 
tion,  the  ebove  Introduced  (open  loop)  balancing 
techniques,  and  the  reduction  via  the  Ressenberg  form. 
In  each  of  these  cases,  the  given  system  Is  first 
transformed  to  a  apeclal  realization  (the  diagonal  or 
Gilbert  realization  In  the  first  case,  the  balanced 
realization  in  the  second,  and  the  Hessenberg  In  the 
latter).  The  technique  Is  also  the  one  used  to  obtain 
a  minimal  realization  from  the  Kalman  canonical 
decomposition  of  a  system  Into  Its  observable  and 
reachable  part.  Its  observable  and  nonreachable  part, 
its  unobservable  and  reachable  part,  and  finally.  Its 
nonreachable  and  unobservable  part.  rinally.  In  the 
design  of  reduced  order  observers,  the  technique 
surfaces  In  a  restricted  sense.  Only  the  subdynamics 
of  the  A  and  B  matrlcea  are  of  Interest  In  the  latter 
case. 

s 

By  Itself,  POD,  without  any  further  restrictions, 
is  not  a  viable  reduction  technique  by  virtue  of  the 
following  facti 

Theorem.  Given  any  observable  realization  of  order  n, 
and  r  <  n,  then  any  arbitrary  r-th  order  denominator 
polynomial  can  be  obtained  by  POO,  after  a  suitable 
similarity  transformation  of  the  full  order  system. 

Pot  Instance,  In  the  design  of  reduced  order  observers, 
one  method  makes  use  of  the  above  fact  by  determining  a 
similarity  transformation  such  that  the  n-1  by  n-1 
1 1 , 1 ) -submatrix  of  A  has  arbitrary  eigenvalues  [ 4 ) . 
More  general  transformation  than  the  one  In  (4)  are 
needed  to  alto  be  able  to  vary  the  ln-2)-th  order 
numerator  polynomial. 

).  Balancing  In  the  LQC  Sense 

In  the  open  loop  case,  the  balanced  realization 
led  to  a  natural  selection  of  reduced  order  models 
through  the  'projection  of  dynamics.'  Adopting  this 
procedure  for  the  design  of  a  reduced  order  controller 
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1*  very  dangerous,  however,  due  to  th«  feedback  around 
the  system.  A  more  dlc*ct  approach  1*  needed,  treating 
th«  closed  loop  as  *  whole.  In  fact,  the  degree  oC 
uncertainty  (l.e.,  the  nolee  cover  lances)  and  the 
performance  Index  or  coat-funct lonal  of  the  system 
should  be  taken  Into  account  for  the  selection  of  a 
reduced  model.  Skelton  (7)  suggested  e  weighting  with 
respect  to  the  ‘component  costs.'  in  a  stochastic 
context,  this  isay  be  undesirable,  since  It  may  lead  to 
‘bad  surprises.*  Indeed,  If  the  uncertainty  associated 
with  a  dynamical  element,  with  small  expected  cost 
contribution,  Is  high,  then  the  actual  cost  contribu¬ 
tion  for  a  sample  trajectory  of  the  stochastic  system 
may  be  quite  different  from  Its  (lower)  expectation. 
This  motivates  the  balancing  with  respect  to  the 
optimal  deterministic  controller,  end  the  stochaatlo 
observer  via  the  separation  principle. 

The  basics  of  ths  LQC  theory  are  well  established, 
and  can  be  found  In  many  textbooks.  The  solution  to 
the  optimal  control  problem  tor  a  linear  system  with  a 
quadratic  performance  Index  In  the  presence  of  white 
gausslan  noise  falls  apart  Into  the  design  of  the 
deterministic  controller  (l.e.,  assuming  perfsct 
knowledge  of  the  state  of  s  system),  and  s  stochastic 
observer  for  the  noisy  system  driven  by  an  external 
(but  assumed  known)  Input.  This  constitutes  ths  cele¬ 
brated  Certainty-Equivalence  Principle  (1).  We  shall 
briefly  summarise  the  solution  tor  this  stochastic 
control  problem.  Also  we  shall  try  to  give  an  Inter¬ 
pretation  to  the  solution,  and  clarify  the  different 
components  In  It.  Several  different  problems  are  now 
of  interest.  In  digital  control  (using  fixed  point 
arithmetic),  the  Interest  Is  In  minimal  sensitivity 
(with  respect  to  the  finite  wordlength  effects)  dsslgn 
of  the  digital  controller.  In  general  control,  one 
might  be  interested  in  a  suboptlmal  but  reduced  order 
controller  (In  order  to  reduce  the  computational 
burden).  Finally,  one  might  just  be  concerned  with  tht 
modeling  end  analysis  of  the  overall  feedback  system 
(thus  Including  the  plsntl  for  the  purpose  of  assesslnq 
the  dominant  contributions  to  the  performance  Index,  or 
uncertainty.  In  this  case,  reduced  order  models  for  the 
combined  plant  and  regulator  are  of  Interest.  It  will 
be  shown  that  the  Ideas  of  balanced  realisations,  when 
properly  (re)deflned  ars  again  very  useful.  The  LQG 
balancing  for  continuous  time  systems  will  be  motivated 
from  the  following  analysis. 

3.1  Tbs  LOG  Terminal  Controller 


x  •  (F-GC)x  i  x  ( t  )  -  x 

o 

C  •  b''g'S 


(5) 

(6) 


S 


-S(F-GC)  -  (F-GC) 'S  -  A  -  C ' BC 


S(t  )  -  Sf  (7) 


The  solution  S(ti  tj ,  Sj)  of  (7)  has  an  Interpretation 
as  a  weighting  matrix  for  the  minimum  ‘cost  to  go*  from 
the  state  x(t)  at  time  t.  For  Sf  •  0,  Sit)  Is  exactly 
the  observebiltly  gramian  of  the  closed  loop  system 
sporting  the  fictitious  (n«m) -dlmsnslonal  output 


Lx 

(8) 
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(9) 

The  performance  Index  then  la  the  output  energy  of 
this  system.  The  presence  of  the  nonsero  Sf  can  be 

Interpreted  as  the  Instantaneous  release  ('flushing') 
of  the  remaining  ‘energy*  In  the  system  (due  to  a 

nonzero  state)  at  time  tf  over  a  weight  matrix  Sf. 

Equivalently,  It  Is  also  a  measure  for  the  amount  of 

Information,  about  an  a  priori  unknown  Initial 
condition,  that  this  fictitious  output  would  carry,  if 
corrupted  by  unit  variance  white  gausslan  noise. 
Similarly,  the  filter  error  dynamics  are  given  by 


(F-KH)  x  ♦  Hu  )  dim 

IQ,/2,  k*'/2| 


(10) 

(11) 


whets 


x  *  x  -  x 


(12) 

(13) 


i  •  (F-KH)P  ♦  P(F-KM) 1  ♦  Q  ♦  KRX '  )  P(t  )  •  Pq  (14) 

and  u(t)  Is  a  whits  noise  of  unit  variance.  Again, 
P(tl  tQ,  PQ)  Is  a  measure  for  the  uncertainty  In  the 
closed  loop  system  lid),  and  char acter lies  the 
‘dlaturbeblllty*  by  the  noise.  It  Is,  In  fact,  the 
covariance  E(xx')  of  the  estimation  error  at  time  t,  tf 
st  time  tQ  the  error  was  PQ. 


Ths  Csrtalnty  Equivalence  principle  states  that 
ths  optimal  control  for  ths  system  ( 1 )  —  ( 2 ) ,  where  the 
states  ars  now  perfectly  known.  Is  given  by  feedback  of 
the  sstlmstes  of  the  states  over  the  optimal  control 
gains.  The  overall  equations  are  thus 


In  order  to  fix  the  Ideas,  consider  the  stochastic 

system 

x*  Fx  +Gu  avid lmx"n, dim  u*m  (1) 

y  ■  Hx  ♦  v  >  dim  y  •  p  (2) 

For  simplicity  (but  without  loss  of  generality),  ws 
shall  assume  that  w  and  v  ars  uncorrelated  zero  mean 
white  gausslan  noise  processes,  with  covariances  Q  and 
R,  respectively.  They  are  further  assumed  to  be 
Independent  from  the  Initial  conditions.  Let  the 
design  objective  be  the  minimization  of  e  positive 
semldeflnlte  quadratic  performance  Index. 

lf 

J  -  E(x'  (t()3fx(tf)  ♦  /  (x'Ax  ♦  u'Buldt)  (4) 

^o 

It  la  assumed  that  the  matrices  R  and  B  are  positive 
definite,  but  otherwise  arbitrary.  In  fact,  this 
aanunts  to  a  slight  overpsramaterlzation,  but  avoids 
soma  preliminary  transformations.  First,  assuming  that 
the  states  can  be  perfectly  measured,  the  (determinis¬ 
tic)  optimal  closed  loop  system  will  have  dynamical 


u  -  -Cx  (15) 

i  * 

x  •  Fx  ♦  Gu  *  K(y-Kx)  |  xq  -  0  (16) 

(Ths  vsrlance  of  ths  sstimate  E(xx')  will  be  denoted  by 
t,  while  II  Is  used  for  the  state  variance  E(xx').  By 
ths  optimality  of  ths  estimates,  we  .have  then  11  »  1  + 
P.  Note  that  the  Innovation  t  •  (y-Hx)  acts  as  a  white 
gausslan  noise  with  covariance  R.  The  following  equiv¬ 
alent  equations  ate  easily  derived. 


X  ■ 

(F-CC)x  ♦  GCx  ♦  v 

(17) 
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*  ^ 

08) 

X  * 

(F-GC) x  ♦  KHx  ♦  Kv 

1 

, 

X  - 

(f-CC-KH)X  ♦  Ky 

(19) 

Ths  optimal  perfotmancs  Index  (4)  can  now  be  evaluated 
in  several  different  forms  (using  partial  Integration 
and  ths  Rlccatl  equations  for  S  and  P  combined  with  the 
Lyapunov  equations  for  J!  and  [  in  the  closed  loop) 
f 

J  •  Tr|n(S{  ♦  /  (An  ♦  C’BCDdt)  (20) 
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*  TrlpfSf  *  /  <AP  *  (22) 

Co 

Several  (equivalent)  Interpretat lon«  follow  fro* 
these  equations.  Equations  (19)  and  (19)  qlva  an  opan 
loop  capraaantatlon  for  tha  optimal  stochastic 
controlltc.  with  Input  y  and  output  tha  control  u 
(Plq.  1).  Tha  Eqa.  (10)  and  (17)  laad  to  a  decomposi¬ 
tion  as  a  cascads  of  a  system  (P-eh,m,I),  driven  by  a 
standard  white  qauaslan  noise*  connected  via  a 
‘transmission*  matrix  GC  to  the  system  (F-GC,I,L| 
(Fig.  2) .  Whereas  the  former  subsystem  repreeents  the 
dynamics  of  the  estimation  error  x  (driven  by  the  fic¬ 
titious  noise  w  for  which  Hu  ■  w-Ev),  the  latter  will 
represent  the  plant  states  x,  If  and  only  If  an  addi¬ 
tional  qauaslan  input  (w)  Is  summed  at  Its  Input.  This 
additional  noise  has  covariance  0,  and  Is  correlated 
with  the  noise  u  according  to  HE(uw‘)  •  Q,  This  Inter¬ 
pretation  will  be  referred  to  ae  the  error-driven 
closed  loop  decomposition. 


Figure  1  The  Closed  Loop  System. 


Figure  2  The  Error-Driven  Closed  Loop  Decomposition. 

In  terms  of  this  decomposition,  we  define  e 
fictitious  output  z  •  t,  ♦  tj  where  Zj  Is  the  (n«m)- 
dlmenslonal  output  from  the  eecond  subsystem  and-Zj  • 
Tx  where  r  Is  the  (n+m)  by  n  matrix  f  •  ICI  ,0! 
and  L  Is  as  defined  In  (9).  Secause  the  outputs  Zj  and 
Zj  are  ‘maximally  Interfering*  (due  to  their  corre¬ 
lation),  the  variance  of  their  sum  s,  Is  actually  the 
difference  of  their  Individual  variances,  which  Is  the 
Integrand  in  (20) . 

Another  lntereetlng  representation  (Fig.  ))  can  be 
derived  etertlng  from  the  Eqs.  (10)  and  (It).  Again  a 
cascade  le  formed,  beginning  with  the  system  (F-kh,m,I) 
driven  by  standard  white  gausslan  noise.  This  time  the 
output  (which  Is  the  repreeentstlon  of  the  etate 
estimation  error)  drives,  via  the  ‘transmlselon-metrlx* 
EH,  the  system  (F-GC,I,L).  This  system  will  have 
the  variable  x  as  state,  If  again  an  additional 
‘correction'  Input  Ev(  with  variance  ERA')  ie  added, 
having  a  correlation  T  with  u  eatlsfylngi 


Figure  3  The  Error-Driven  Estimator  Decomposition. 
HTE’  •  E  (Mu)  (Ev) '  •  -ERE 

A  fictitious  output  z  -  Zj  ♦  Z|  Is  defined  which  gener¬ 
ates  the  Integrand  In  the  performance  Index.  In  this 
case.  It  Is  readily  verified  that  this  Is  accomplished 
by  Zj,  ths.uoutput  of  the  cascade  and  z^  -  Nx,  where 
H*  ■  [0,  A  '  1.  Note  that  In  this  decomposition,  the 
Input  to  the  x-subsyste*  Is  actually  E  times  the 
innovations  process.  This  Is  known  to  act  as  a 
white  noise.  Here  Zj  and  z4  are  nonlnterfering 
(uncorrelated).  This  decomposition  will  be  called  the 
error-driven  estimator  decomposition. 

3.2  Interpretation  of  the  Cost  Functional 


The  two  decompositions  described  In  the  last  para¬ 
graph  lead  to  a  ‘cost  decoupled*  Interpretation  of  the 
various  terms.  For  simplicity,  we  shall  fix  the  Ideas 
on  the  LQG-rsgulator  problem.  The  matrices  F,  G,  R,  Q, 
and  E  are  all  supposed  to  be  time-invariant,  and  It 
will  be  assumed  that  the  statistical  stationary  state 
exists.  For  the  steady  state  regulator  problem,  the 
cost  rate,  rather  than  the  cost  (which  Is  Infinite),  Is 
computed.  Let  In  the  LQG  terminal  controller,  tQ  and 
tj  approacn  -•  and  ♦»,  respectively.  Then  the  cost 
rate  Is  the  limit  of  the  expected  cost  per  time  unit. 
It  follows  then  from  the  Eqs.  ( 20 )  —  (22)  that  the  cost 
rate  Is 

3  •  Tr {AH  ♦  C'BCl)  -  Tr (SO  ♦  C'BCP)  -  Tr (ap  ♦  ERE'S( 


whers  now  F  and  S  satisfy  the  algebraic  Rlccatl  aqua¬ 
tions,  respectively 

(F-EH)P  »  P(F-KH)'  ♦  MM'  •  0  (24) 

(F-GC)'S  ♦  S(r-CC)  *  L'L  -  0  (25) 

As  already  discussed,  the  following  Identifica¬ 
tions  follow,  where  the  ***  entry  Is  Irrelevant: 

P  is  the  REACHABILITY  granlan  for  the  system  (M,F-EH,‘) 

S  Is  the  OBSERVABILITY  gramian  of  the  system  (‘,r-GC,LI 

Consider  now  a  fictitious  system,  consisting  of 
the  two  decoupled  systems  (F-GC,0''  ,L)  and  (F-EH.M, D . 
If  both  are  driven  by  Independent  standard  white  gaus- 
slan  noise,  then  their  outputs  (respectively  z,  and  Xj) 
will  be  uncorrelated  as  well,  and  their  expected 
‘powers*  additive.  The  contribution  of  the  first  Is 
exactly  Tr  (o^F-CC,  L)  )  •  Tr(os),  while  the  contribution 
of  the  second  is  Tr  |w£(P-KH,  D  )  -  Tr  (  rr‘0t(P-RH,M) )  - 
Tr(C'BCP).  We  used  the  fact  that  In  the  steady  state, 
the  A-weighted  output  power  of  a  system  (F,C,H)  driven 
by  a  zero  mean  white  gausslan  Input  of  variance  Q,  can 
be  expressed  as 
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But  not*  that  thi*  la  exactly  th«  configuration  of 
th«  *croc-drlv*n  cloaad  loop  system  decomposition, 
rig.  2,  If  on*  replaced  th*  input  noises  by  uncotre- 
lat*d  on**,  and  a*t  th*  tcanaalaalon  aatrlx  GC  to 
zero.  Th*  two  partial  output*  act  than  indeed 
noointarfering.  Thi*  lead*  to  th*  INTERPRETATION 
that  Tr{QS)  1*  th*  partial  coat  rat*  in  th*  deteralnls- 
tlc  cloaad  loop  ayataa  due  to  th*  proc***  noil*  u  only, 
1.*.,  assuming  full  knowledge  of  th*  *tat*  x.  Th*  part 
Tr(C'BCP)  1*  th*  coat  rat*  du*  to  th*  atat*  aatinatlon 
•trot.  It  1*  a*  if  th*  rol*  of  th*  tranaalaaion  aatrlx 
GC  and  th*  Input  correlation  la  to  guarantee  aaxlaal 
destructive  Interference  b*tw**n  th*  output*  (*o  that 
th*  variance  of  th*  output  la  th*  difference  of  th* 
Individual  variances). 

Similarly,  with  regard*  to  th*  «rror-drlv*n  eetl- 
aator  decomposition,  rig.  3,  w*  hav*  th*  estimator 
subsystem  (P-GC,K,L)  driven  by  th*  Innovation*  c,  and 
with  output  Xj,  and  th*  estimation  error  eyatta 
(r-U,M,N)  driven  by  u  and  with  output  z^.  Th*  two 
outputs  ar*  uncorr*lat*d,  and  th*lr  sum  «  *  *j  *  *4  h*» 
covariance  equal  to  th*  cost  rat*  of  th*  optimality 
controlled  system.  A  cost  equivalent  decoupled  form 
can  be  constructed,  consisting  of  the  system  (P-GC,X,L) 
(th*  estimator  system)  driven  by  v,  considered  Indepen¬ 
dent  of  th*  noise  u,  which  drives  th*  error  system 
(H,r-XH,N).  Their  output  'power*  contributions  ar* 
respectively  Tr{xRX's|  and  Tr{AP}.  Thus  th*  rol*  of 
th*  transmission  aatrlx  XH  seems  to  effectively 
uncorrelat*  th*  two  Input  noises. 


or,  using  th*  fact  that  (1  la  diagonals 


1  *  l  ni^ll+n[l(G8',G'>ll) 
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It  Is  clear  that  th*  cost  rates  corresponding  to  the 
individual  state  components  are  not  simply  determined 
by  th*  magnitudes  of  th*  elements  of  the  canonical 

Rlccatlan,  but  also  depend  on  the  relative  magnitudes 
of  th*  diagonal  elements  of  Q,  GB  G',  or  A  and  H'R  H. 

If  th*  system  (in  balanced  coordinates)  Is  parti¬ 
tioned  into  two  coupled  subsystems  with  n,  >  fly  then  a 
sufficient  condition  for  th*  part  corresponding  with  fl 
to  hav*  the  dominant  cost  rat*  contribution,  is  that 
either  of  th*  following  sets  of  Inequalities  are 

satlafledi  (th*  Indices  refer  to  th*  block  entries) 

n,  >  n2 

«U  »  °22  1301 

(GB_,C,)U  >  (Gb’,G,)22 

°1  >  °2 

*11  >  *22  13,1 

(K,R_1H)n  >  (H'r"'h)22 

Th*  various  terms  can  be  Interpreted  as  quantify¬ 
ing  the  following: 

Q  t  Disturbance  (noise)  In  th*  plant. 

GB"  0 1  i  'Potential*  of  th*  system  Input  to  decrease 

the  regulation  cost. 

A  :  Cost  on  the  state  deviations. 

1  Information  (about  th*  state)  gained  from 
the  measurements  (observations) . 


Th*  contribution  tr(krk's)  can  thus  be  Identified 
as  th*  coet  rat*  for  th*  closed  loop  eystem  under  the 
assumption  that  th*  estimated  state  were  th*  correct 
state.  However,  since  not  x,  but  x  Is  th*  atat*  of  th* 
closed  loop  system,  a  correction  occurs_  due  to  th* 
Imperfect  knowledge  of  th*  atat*  (1.*.,  x) .  This  Is 
represented  by  th*  (Independent)  contribution  of  th* 
error  subsystem  (M,P-XH,N) ,  driven  by  th*  equivalent 
noise  w,  with  cost  rat*  contribution  Tr { AP ) . 

3.3  LOG  Balanced  Realization* 


Since  S  and  P  transform  under  a  similarity  trans- 
fomratlon  T  as  T^ST*  and  TPT' ,  respectively,  It  Is 
possible  to  transform  any  given  realization  such  that 
In  th*  new  coordinate  system, 


where  fl  Is  diagonal,  with  Its  elements  ordered  In 
magnitude.  fl  la  called  th*  CANONICAL  RICCATIAN.  Th* 
new  realization  will  then  be  referred  to  a*  th*  LQG 
BALANCED  realization.  Not*  that  Q  a*  well  as  A  also 
transform  under  th*  similarity.  (B  and  R  are,  of 
course,  Invariant  aa  they  relate  to  'external* 
variables. ) 

Th*  cost  rat*  for  th*  optimally  regulated  system 
la  then  In  th*  balanced  coordinates. 


Th*  first  set  of  inequalities  expresses  that  the 
set  of  variates  that  ar*  most  disturbed  by  noise  and 
for  which  at  th*  same  time  the  Input  potential  Is  high, 
ar*  dominant.  (The  larger  the  Input  potential,  the 
less  th*  cost  of  control.)  Alternatively,  state 
variables  for  which  th*  Information  contained  In  th* 
measurements  and  the  state  cost  Is  highest,  also  con¬ 
tribute  to  th*  major  parts  In  th*  regulation  cost  rat*. 

If  on*  were  only  Interested  In  obtaining  a  simple 
model  for  the  optimally  regulated  system,  for  Instance 
with  th*  goal  of  Identifying  th*  dominant  contributions 
to  th*  uncertainties  and  th*  costs,  then  th*  combined 
plant  and  regulator  may  be  reduced  by  'projection  of 
dynamics.*  Th*  decision  on  th*  order  of  th*  reduced 
model  can,  for  instance,  be  based  on  thresholding  th* 
ratio* 

Trfl 

•HI  *  =7 7T 


j(r)  -  I'  fl1(Qu^1(GB-,G-,u) 

Xf,  on  the  other  hand,  one  wants  to  design  a 
reduced  order  regulator  Cor  a  fixed  plant,  the  above 
cannot  be  taken  over  directly.  Xt  has  been  shown  that 
the  design  of  a  reduced  order  regulator  based  on  a 


j  •  Tr  n(Q+C‘BC)  •  Tr  n<A*KRK') 
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reduced  order  nod* 1  of  the  plant  nay  ba  unaatlaf aotory. 
Alto,  a  ’projection  of  dynanlca*  approach  on  the  full 
ordar  combined  plant  and  tabulator,  based  on  tha  magni- 
tuda  of  tha  alananta  of  tha  canonical  Alccatlan  alona 
nay  not  guarantee  tha  atablllty  of  tha  regulation  of 
tha  (full  ordar)  plant  with  tha  obtalnad  caducad 
ragulator  (21.  Tha  ptoblaa  la  here,  of  course,  tha 
feedback  around  tha  ayatan. 

In  rafaranca  to  tha  decompositions  In  Plb*.  2  and 
3,  tha  followlnb  proparty  of  tha  trananlaalon  matrix  la 
darlvad. 

Theorem.  Tha  lowar  (uppar)  trlanbular  part  of  tha 
trananlaalon  aatrlx  GC(KH)  la  doainant  In  tha  balancad 
coordlnataa. 

Proof.  Slnca  T  •  OC  •  Gs"'g‘(1,  It  followa  that  f)T  la 
symmetric.  But  than  (l^T,  .  *  T  ft,  for  all  1  and  J. 
Hanca,  T.j  ■  1  Since  by  Assumption  tha  ala- 
manta  are  ordered,  we  gat  >  T^  whanavar  J  >  1. 

Similarly,  X  ■  XH  •  OH’R  'b  and  thua  X!)  la  symae- 
trlc,  from  which  Xjj  <  X^j  whanavar  J  >  1. 

Conaidar  tha  error-driven  cloaad  loop  dacompo- 
altlon.  Keeping  tha  ordarlnb  In  mind  for  tha  balancad 
caaa,  It  followa  that  low  uncartalnty  atataa  ara  nora 
parturbad  by  tha  hlbh  uncartalnty  atataa  than  vlca 
varaa.  It  furthar  followa  from  tha  poaltlva  aeml- 
daflnltanaaa  of  tha  *lnput-potantlal*  that  for  j  >  1 

(GB~1G‘ )  ^  <  (GB'1Q,)u(CB''a,ljj 

and_pius  that  tha  alananta  In  tha  uppar  laft  block  of 
(GB  G'in  ara  larger  In  mabnltuda  than  tha  alananta  In 
tha  uppar  rlbht  block  of  T,  Hanca,  x2  la  alnoat 
dacouplad  fran  tha  cloaad  loop  ayatan  (I,f-GC,L).  In 
fact,  for  tha  aama  raaaon  tha  uppar  rlbht  block  of  tha 
cloaad  loop  ayatan  aatrlx  r-GC  will  ba  cloaa  to  that  of 
P  ltaalf,  ao  that  thara  la  alnoat  no  faadback  froa  tha 
x2  aubayatan.  Ona  axpacta,  tharafora,  that  tha  cloaad 
loop  dynanlca  of  tha  plant  oontrollad  with  tha  raducad 
tabulator,  would  hava  a  naar  optimal  bahavlor.  Similar 
arguments  work  with  Pig.  3, 

4.  Cone lua Iona 

Through  aoma  phyalcal  Inalght  In  tha  varloua  tarns 
In  tha  optlnal  parfornanca  lndax.  It  was  ahown  that  tha 
canonical  rlccatlon  alona  doaa  not  glva  aufflclant 
Information  for  a  faithful  daclslon  towarda  a  raducad 
ordar  nodal.  Bufficlant  conditions  Involving  addi¬ 
tional  paranatars  (30)  or  (31)  hava  baan  givan  that 
anabla  such  a  faithful  raducad  nodal  daslgn.  Soma 
furthar  generalisations  (tha  tarmlnal  controllar  and 
followar  problam  for  tlma  varying  stochastic  systans) 
ara  prasantly  undar  lnvaatlgatlon. 
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ON  REDEFINING  THE  OPTIMAL  LEAST  SQUARES  FILTER  UNDER  FLOATING  POINT  OPERATIONS 


£.  I.  Vet r  lest 


School  of  Electrical  Engineering 
Georgia  Institute  of  Technology 
Atlanta,  Georgia  30312 


1 .  INTRODUCTION 

The  linear  least-squares  filtering  problem 
is  revisited.  The  optimal  solution  is  of  course 
the  celebrated  Kalman  Filter.  However,  the  opti¬ 
mality  of  the  solution  requires  that  all  computa¬ 
tions  are  performed  with  infinite  precisian, 
which  is  of  course,  out  of  this  world.  The  ques¬ 
tion  we  therefore  pose  is:  'Given  the  wordlength 
constraints  of  the  digital  machine  (which  results 
in  finite  precision)  what  is  the  optimal  solution 
to  the  least  squares  problem?*  Preliminary 
results  indicate  that  if  arithmetic  errors 
(roundoff  or  chopping)  are  modeled  as  additional 
noise  sources,  then  the  original  problem  chop¬ 
ping)  are  modeled  as  additional  noise  sources, 
then  the  original  problem  can  be  recast  in  an 
equivalent  least  squares  problem,  now  assuming 
infinite  machine  precision,  but  with  additional 
noise  terms. 

Under  floating  point  operations,  the  arith¬ 
metic  error  depends  on  the  magnitude  of  the 
operands,  and  hence  the  equivalent  problem  will 
turn  into  a  nonlinear  filtering  problem.  In 
order  to  simplify  the  mathematics  the  diffusion 
approximation  is  taken.  Exact  solution  of  the 
nonlinear  least  squares  problem  is  computational¬ 
ly  prohibitive  since  the  entire  distribution 
needs  to  be  updated  (via  the  Kolmogorov  (Fokker- 
Planck)  partial  differential  equation).  Several 
approximations  are  proposed:  the  truncated 
second  order  filter,  the  Gaussian  second  order 
filter,  and  the  method  of  the  quasi-moments 
(Hermite  functions).  Since  this  nonlinear  filter 
has  a  different  .structure  than  the  one  for  which 
the  equivalent  noise  model  was  derived,  implemen¬ 
tation  of  it  on  the  digital  machine  will  induct  a 
second  order  type  arithmetic  error.  This  error 
is  either  assumed  to  be  small  and  therefore 
neglected,  (preliminary  results  proved  that  this 
was  indeed  the  case  under  some  circumstances)  or 
one  can  Incorporate  the  error  again  in  an  update 
to  a  new  equivalent  nonlinear  problem,  and  keep 
iterating  if  needed.  A  fixed  point  theorem  argu¬ 
ment  shows  convergence  to  an  'exact*  nonlinear 
equivalent  model. 

This  work  should  complement  the  recent  work 
by  Moroney  (1).  There  only  fixed  point  arith¬ 
metic  is  considered.  Previous  work  along  these 
lines  Includes  the  work  on  estimation  and  control 


with  quantized  measurements  by  Curry  (2) 
(although  infinite  precision  computations  are 
assumed),  and  the  work  by  Miller  and  wrathall 
(3).  The  floating  point  arithmetic  modeling 
relies  on  the  work  of  Knuth  14)  and  Vandergraft 
IS).  The  nonlinear  estimation  methods  used  are 
standard  (e.g .  (6 ) ) . 

The  next  section  describes  some  preliminary 
results.  A  full  exploration  will  be  deferred  to 
a  later  ‘paper.  The  problem  is  set  up  as  a  least 
squares  problem  in  discrete  time.  A  stochastic 
model  is  given  and  the  floating  point  constraints 
are  to  be  taken  into  account.  First  an  equiva¬ 
lent  unconstrained  model  is  set  up  in  discrete 
time.  The  noise  enters  nonlinearly  in  this 
model.  It  is  shown  below  that  an  elegant  solu¬ 
tion  can  be  obtained  by  'embedding*  this  discrete 
model  into  a  continuous  one,  in  which  the  noise 
enters  in  an  additive  fashion.  This  step  is  what 
we  refer  to  as  the  diffusion  approximation.  This 
continuous  time  model  is  then  solved  (exact  solu¬ 
tions  are  possible  in  some  cases)  or  approximat¬ 
ed,  and  finally,  the  resulting  continuous  filter 
is  discretized  to  yield  the  (sub)optimal  discrete 
filter  for  the  constrained  problem.  These  three 
steps  are  discussed  more  specifically  for  the 
linear  filter  problem  (Kalman  filter)  for  the 
model 


y.  •  Hx.  ♦  v  (0) 

k  k  k 

2.  equivalence  or  THE  floating  point 

CONSTRAINED  LINEAR  PROBLEM  AND 
THE  UNCONSTRAINED  NONLINEAR  PROBLEM 

For  these  preliminary  results  it  will  be 
assumed  that  all  gains  can  be  precomputed  in 
Infinite  precision,  so  that  only  roundoff  involv¬ 
ing  the  estimator  state  update  needs  to  be 
considered.  If  now  a  certain  formula  yields  a 
certain  vector  x,  then  the  actual  computation 
will  result  in  a  finite  precision  approximation 
Q[x)  of  it.  Hence  we  can  write  x  •  Q[x)  +  e. 
Applying  this  to  the  optimization  problem  for  the 
measurement  update  yields  the  *d_esired* 
update  x^,  from  the  a  priori  estimate  x^,  and 
prior  covariance  P^.  The  actual  covariance  is  M 
♦  ^  where  M_1  *  P-'  ♦  H'R_1H  and  ip  is  the 
truncation  error  covariance.  Similarly,  the  time 
update  yields  x^+)  *  tx^  with  actual  covariance 
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3.  THE  DIFFUSION  APPROXIMATION 


Vi  *  *(Mk  *  *M>4'  4  n5r 

'  ♦ 

in 

where  is  again  the  truncation 
induced"  by  the  xk+1  computation, 
priori  covariance  update  results 

tr  roc 

cover  lance 

The 

overall  a 

In 

Pk+1  •  4Pk4'  ♦  ror1  ♦  V^h 

*  ♦  R) 

-1 

k; 

+  *V  4  tt 

(2) 

-  4?^'  (HPkH‘  ♦  R) 

-\ 

(3) 

But  thla  la  axactly  the  covatlanca  ona  would  hava 
had  lot  tha  unconattalnad  filter  foe  tha  ayataai 


Vt  *  +  %  *  *k  (4) 

yk  -  Hxk  ♦  vk  (5) 


Ualng  tha  backward  difference  (If  ♦  la  non- 
singular),  tha  dlacrate  model  la  tranafocmed  Into 
a  contlnuoua  model.  If  &  la  the  time  step,  then 
we  aat  kb  ■  t.  Under  tha  aasumptlon  that  A  la 
aufflciently  small,  this  leads  to  the  Ito- 
differential  equation 

x(t)dt  •  *lx(t)dt  -  Alx(t))  +  rdu(t) 

+  y  diag(x(t)  !dc(t) 
or 

i  -i  e-1  r 

dx(t)  --(!-♦  )x(t)dt  +  ■*—>-  du(t) 

A  A 

♦  diag  (x(t)  )dc(t)  (10) 

A 

which  la  of  the  form 


where  tha  covariance  of  ek  la  f  *  4^4'  ♦  tu. 
One  major  difference  occurs,  and  that  la  that  for. 
the  above  model  the  estimate  x  and  tha  error  x 
will  be  conditionally  Independent.  Thla  la  not 
the  case  In  the  original  problem.  Hence  the 
equivalence  la  only  with  respect  to  the  error 
covariance.  So  far  the  pdf  of  the  truncation  or 
rounding  error  haa  not  been  modeled.  Although 
this  error  is  obviously  uniform  In  the  least 
significant  digit,  It  will  be  approximated  by  a 
Gaussian  distribution  (which  la  even  exact  for 
the  second  order  truncated  moment  filter).  The 
results  in  [S]  for  floating  point  operations  can 
easily  be  extended  to  vector  and  matrix  opera¬ 
tions.  It  turns  out  to  that  the  floating  point 
errora  in  each  component  are  proportional  to  tfiat 
component.  Hence  the  error  in  the  update  xj^. 
can  be  written  as 

e  ■  x  -  Q (x)  ■  y  diag(x)e  (6) 

where  c  is  a  random  vector  with  mean  aero  and 
covariance  I  and  where  y  1»  some  predetermlnable 
constant. 

Assuming  that  x  is  sufficiently  close  to  x, 
the  equation  (6)  gets  transformed  into 

e  -  y  diag (x) e  (7) 

With  the  previous  discussion  this  results  then 
finally  in  the  equivalent  model 

Vi  ■  *xk  +  ruk  *  T  dl,9(Vi)ek  <8) 

yk  -  Hxk  ♦  vk  (9) 

The  nonlinearity  arises  in  the  read- in  matrix  for 
the  noise  vector  c.  .  Equation  (8)  is  further  an 
implicit  equation  In  xk+1  because  of  the  appear¬ 
ance  of  this  term  on  both  sides.  Rather  than 
converting  this  in  an  explicit  form  (which 
destroys  additivity  of  the  noise),  a  diffusion 
approximation  is  described  in  the  next  section. 


dx(t)  ■  F(t)x(t)dt  ♦  G(x(t))dw(t)  (11) 

where  w(t)  is  a  standard  Brownian  motion  with 
incremental  covariance  I,  and 

G (x ( t)  )G '  (x ( t) )  - 

-  ^  [♦"'ror**'*1  ♦  Y2  diag2  (x ( t)  )  ]  (12) 

A 

If  8  were  singular  the  forward  difference  can  be 
approximated  to  an  Ito-equation  like  (11). 
Aspects  for  the  ‘contlnulzatlon*  of  a  discrete 
system  in  the  deterministic  case  are  developed  in 
17). 

4.  APPROXIMATE  SOLUTIONS  OF  THE 
NONLINEAR  PROBLEM 

To  demonstrate  the  feasibility  of  the  pro¬ 
posed  method,  we  generate  several  approximations 
for  the  case  of  a  first  order  system. 

dx(t)  •  f  ( t)  x  ( t)  dt  +  /q  j  +  q2x2(t)  dw(t)  (13) 

The  truncated  second  order  filter  (6)  gives  the 
time  updates 


l 


x(tltl_1 

)  •  fttlxltlt^,) 

P(tltl-1 

)  -  JfttlKtltj.,) 

*2 

(tl  t1-t )  +  qjPltlt^ 

♦  (q,  +  q2« 

(2f(t)  ♦  q^) P (t 1 t 

*  <i 

♦  qjX*2  (tlt1_1 ) 

it  is  clear  that  the  covariance  decays  slower  due 
to  the  round  off  noise.  In  fact,  stability  of 
f(t)  is  no  longer  sufficient  to  assure  conver¬ 
gence!  The  meaurement  update  formulas  give 

x(t*)  -  xltj)  +  K(tl)(y1  -  hx(t.))  (16) 

P(t*)  ■  P(t")  -  K (t^ ) hP( t”)  (17) 

K.t.)  ■  P(t”)h  r’1  (t t)  (18) 
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-  --  ‘  ■ 


W  “  h2p(V  +  R(ti>  ('9) 

These  formulas  are  Identical  to  the  ones  in  the 
infinite  precision  assumption.  One  can  also  use 
the  Gaussian  second  order  filter.  The  measure¬ 
ment  updates  are  again  given  by  (16-19),  but  now 
the  time  update  for  the  covariance  has  the  addi¬ 
tional  term 

1  *2  -3/2  2 

jq1<i2(q1  +  q2  x  uitj.,))  p  1 1 1 1 1_  1 ) 

in  the  right  hand  side.  The  estimate  update  is 
again  as  in  (14).  He  finally  remark  that  second 
order  filters  provide  a  performance  generally 
superior  to  that  of  first  order  techniques  (such 
as  the  extended  Kalman  filter),  especially  for 
small  noise  strengths  (6). 

Higher  order  moment  filters  can  be  generated 
as  well,  e.g.  the  cumulants  truncation  (of  which 
the  second  order  Gaussian  is  a  special  case) . 
Since  in  fact  we  expect  nearly  Gaussian  distribu¬ 
tions,  another  good  alternative  is  the  condi¬ 
tional  quasi-moment  method.  Hera  the  unknown 
density  is  expanded  in  terms  of  the  Hermits  func¬ 
tions 

H  (x) 

— -  f_  (x) 

/rr  G 

where  fG  is  the  nominal  Gaussian  distribution. 

Also  the  direct  approximate  solutions  for 
.he  Fokker-Planck  equation  can  be  developed, 
e.g.  discretization  and  model  reduction  via 
balanced  realizations.  Preliminary  work  on  the 
feasibility  of  this  scheme  is  in  progress. 

In  all  these  cases  the  remaining  step  is  to 
discretize  the  time  update  equations.  The 
resulting  discrete  filter  has  the  structure 

x(k+l )  -  *x(k)  +  K  (k)  (y^  -  lta(k)) 

but  now  K(k)  differs  from  the  Kalman  gain,  and  is 
data  dependent. 

Exact  solutions  for  an  important  class 
exist.  Namely  if  there  is  no  (real)  process 
noise  (Q  50),  then  the  only  noise  in  the  equiva¬ 
lent  model  is  the  quantization  noise,  and  the 
diffusion  model  leads  then  to  bilinear  stochastic 
differential  equations.  This  will  be  the  case 
for  instance  in  all  deterministic  processing,  or 
in  systems  involving  pure  Newtonian  dynamics  as 
for  example  in  spacecraft. 

A  theory  of  filters  for  bilinear  systems  has 
been  developed  [8-11]  and  evolves  around  the 
theory  of  Lie-algebras.  The  important  result  is 
that  if  the  underlying  Lie  algebra  is  solvable 
[12],  the  exact  moments  (and  hence  solutions)  can 
be  computed.  These  results  are  applicable  to 
problems  of  satellite  tracking  and  rigid  body 
or ientation. 


Finally,  we  remark  that  we  can  build  on 
existing  work  [13]  for  treating  the  combined  LQG 
problem  under  floating  point  arithmetic. 

S.  CONCLUSION 

A  novel  solution  on  approximation  to  the 
least  squares  filter  problem  under  floating  point 
arithmetic  is  presented  for  a  linear  stochastic 
model. 

To  answer  the  question  where  the  benefit  of 
this  study  will  be,  it  is  perhaps  easiest  to 
state  that  if  the  stochastic  model  has  low  order 
and  possesses  slow  time  constants,  and  if  a 
general  purpose  computer  is  available  with  large 
wordlength,  then  the  finite  wordlength  effects 
are  going  to  be  negligible  and  there  will  be  no 
benefit  from  this  study.  On  the  contrary,  if  one 
deals  with  microprocessor  control  of  large  scale 
systems  and/or  systems  with  multiple  time  scales, 
(singularly  perturbed  systems)  then  potential 
benefit  will  be  gained  from  a  deeper  study  of  the 
optimality.  The  resulting  optimal  filter  may 
turn  out  to  be  nonlinear,  but  this  does  not 
necessarily  Increase  the  complexity  significantly 
(e.g.  multiplication  versus  addition) . 

The  more  detailed  study,  under  progress, 
considers  the  floating  point  constraints  of  the 
gain  sequence  computation  as  well.  The  latter  is 
obviously  data  dependent.  The  Information  filter 
form  is  believed  to  yield  the  best  approach. 
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ERROR  ANALYSIS  OF  LINEAR  RECURS  I  OKS  IN  FLOATING  POINT 
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ABSTRACT 

A  stochastic  error  model  for  floating  point 
arithmetic  is  developed  and  used  to  study  the 
effects  of  finite  wordlength  on  linear  recursive 
formulas.  Optimal  realizations  e>Ast,  but  they 
are  highly  sensitive. 


1.  INTROOOCTION 

The  accumulation  and  roundoff  error  in  long 
computerized  calculations  and  recursive  algorithms 
is  a  phenomenon  that  can  destroy  an  efficient  and 
sound  computational  procedure  based  on  arithmetic 
over  the  real  number  field.  A  tell-tale  example 
is  the  Kalman  filter  divergence.  Analytically 
this  state  estimator  yields  the  estimate  with 
minimal  covariance,  but  the  actual  error  covari¬ 
ance  may  be  much  larger  than  the  predicted  covari¬ 
ance  (which  solves  a  certain  Riccati  equation)  or 
even  grow  unboudedly  whenever  a  finite  precision 
version  of  the  algorithm  is  implemented. 

It  is  clear  that  in  order  to  keep  track  of 
the  confidence  in  the  computed  results,  a  certain 
measure  of  confidence  should  be  computed  and 
'tracked*  with  each  operation  or  update. 

One  such  measure  is  the  rigorous  computation 
of  error  bounds  via  interval  analysis.  Not  only 
are  the  additional  computations  that  are  required 
time  consuming,  but  the  obtained  results  may  ba 
overly  conservative.  The  Individual  rounding 
errors  in  a  compounded  expression  indeed  tend  to 
cancel  rather  than  to  reinforce  each  other  if  an 
unbiased  rounding  rule  is  used.  In  this  paper 
another  measure  of  confidence  is  used.  It  is  of  a 
more  probabilistic  nature  and  estimates  the  propa¬ 
gated  covariance  due  to  the  finite  wordlength 
errors.  Such  a  study  is  standard  (and  straight¬ 
forward)  for  fixed  point  arithmetic,  and  is  well 
described  in  several  textbooks,  culminating  in 
the  optimal  filter  implementations  by  Mullis  and 
Roberts  [8]  and  the  LOG  compensator  by  Moroney 
17].  One  of  the  serious  shortcomings  of  the  use 
of  fixed  point  arithmetic  is  the  necessity  of 
scaling  in  order  to  provide  a  higher  accuracy. 


This  disadvantage  is  obliterated  when  floating 
point  arithmetic  is  used. 

Modern  digital  technology  has  rapidly 
increased  the  speed  of  floating  point  processors. 
As  a  result,  these  modules  are  increasingly  intro¬ 
duced  in  real  time  applications  of  estimation, 
control,  digital  filtering,  and  general  signal 
processing,  and  the  need  for  a  comprehensive  anal¬ 
ysis  of  its  limitations  (due  to  finite  wordlength 
effects)  is  obvious.  No  fully  comprehensive  model 
for  the  wordlength  effects  in  floating  point 
exists,  although  some  very  significant  contribu¬ 
tions  have  been  made  in  the  past.  Attempts  to 
give  rigorous  analysis  of  a  sequence  of  floating 
point  operations  have  proven  to  be  so  formidable 
that  one  has  to  content  one's  self  with  plausi¬ 
bility  arguments  (e.g.,  see  [4],  p  213).  A 
noteworthy  contribution  is  the  axiomatization  by 
Kulish  [5). 


General  statistical  modeling  of  floating 
point  errors  relies  on  the  work  of  Wilkinson  [16), 
Forsythe  and  Moler  [2],  Stewart  [11],  Knuth  ( 4 j , 
and  Vandergraft  [ 12).  Brent  [1],  Marasa  and 
Matula  [6)  performed  extensive  simulations  for 
various  finite  precision  arithmetic  systems. 
An  analysis  of  the  effects  in  digital  filtering 
is  dus  to  Kan  and  Aggarwal  [3]  and  others.  Rink 
and  Chong  analyzed  the  performance  of  a  floating 
point  state  regulator  [9].  Van  Wingerden  and  De 
Koning  [13)  recently  combined  this  work  with  a 
Monte  Carlo  identification  technique.  A  dynamical 
stochaatic  model  was  used  by  Verriest  [14]  in  the 
computation  of  a  gain  correction  for  (Kalman) 
filtering  applications  in  floating  point.  In  this 
paper,  the  finite  wordlength  effects  on  discrete 
linear  recursions  of  the  form  (dim  x  •  n,  dim  u  • 
ra,  dim  y  •  p) 


yk-Cxk  (1) 

are  analyzed.  The  paper  is  organized  as  follows. 
First  a  model  for  the  floating  point  errors  is 
discussed.  The  third  section  then  uses  this  error 
model  to  obtain  representations  for  various  confi¬ 
dence  measures. 


This  work  was  supported  by  the  U.S.  Air  Force 
under  Contract  P-08635-84-C-0273. 


2.  THE  ERROR  MODEL 

We  shall  work  with  normalized  floating  point 
numbers.  For  a  given  base  b,  they  are  expressed 


i 


I 


l 
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by  two  numbers,  e  end  f.  Here  e  is  an  integer 
exponent  end  f  is  e  signed  fraction  assumed  to  be 
normalized,  i.e.,  b"1  <  | £  j  <  1.  The  value  of  the 
floating  point  number  is  then 


Since  the  coding  of  the  exponent  e  and  the  signed 
fraction  must  fit  into  a  given  wordlength  (w), 
there  will  be  a  tradeoff  between  the  precision  and 
the  range  of  the  representable  numbers  in  the 
computer  (1).  However,  in  this  work  we  shall 
assume  an  infinite  range  and  only  consider  the 
effects  of  a  finite  number  of  digits  in  the 
fraction.  Besides  simplifying  the  analysis,  this 
assumption  can  be  substantiated  by  the  fact  that 
normally  an  underflow  or  overflow  would  cause 
program  termination,  and  we  are  only  interested 
in  the  finite  wordlength  effects  during  a  normal 
program  execution.  Following  Kulish  (Si,  the 
situation  is  as  follows.  We  have  R,  the  set  of 
reals  and  an  operation  *  (which  is  ♦  ,  -,  x, 

or  r)  .On  the  computer  the  elements  of  R,  as  well 
as  the  results  of  a  *  b  are  not  exactly  represent¬ 
able.  Hence  the  reals  must  be  mapped  in  a  subset 
P  according  to  a  proper  (i.e.,  monotone  and  symme¬ 
tric)  mapping  Q:R  *  P.  The  approximation  of  the  * 
operation  is  then  Q[  *,•,•) 

Q[*J  a,b)  s-  Q(a'b)  (2) 

Unfortunately,  the  in  .general  not  representable 
result  a  •  b  seems  to  be  necessary  for  its  reali¬ 
zation.  It  can  be  shown,  however,  that  in  all 
cases  where  a  •  b  is  not  exactly  representable,  it 
is  sufficient  to  replace  it  by  an  appropriate  and 
representable  value  a  •  b  with  the  property 

Q(a*b>  -  Q(a'b)  (3) 

Then  the  proper  definition  is 

Ql*i  a,b)  Q(a:b!  (4) 

The  concrete  algorithms  for  the  realization  of 
this  formula  can  then  bo  decomposed  into  four 
steps. 

1.  Identification  of  the  exponent  and  frac¬ 

tion  of  a  and  b. 

2.  Execution  of  a*b. 

3.  Renormalization. 

4.  Happing  into  P  (because  accumulation  of 

higher  wordlength  may  be  used). 

The  cause  of  floating  point  errors  is  three¬ 
fold.  Pirst,  there  are  ^ntrinsi£  errors,  due  to 
finite  wordlength  representation  of  a  given  number 
(parameters,  inputs...).  Even  if  two  numbers  have 
an  exact  representation,  binary  operations  (sum, 
product...)  on  them  may  require  a  longer  word- 
length  for  exact  representation.  Hardware  imple¬ 
mentation  greatly  affects  this  error  (presence  or 
absence  of  guardbits,  double  register  arithmetic, 
rounding  or  truncation...).  The  errors  Induced  by 
binary  operations  are  referred  to  as  extrinsic 
errors.  Finally  all  these  errors  propagate 
through  the  recursion  and  hence  accumulate.  They 


are  called  the  inherent  errors  since  they  inherit 
their  properties  from  the  operation  sequence  and 
the  given  recursion. 

The  intrinsic  errors  are  bounded  by  the  'unit 
in  the  least  significant  digit*  times  be  .  The 
error  is  therefore  uniform  in  an  interval  with 
length  proportional  to  the  number  itself  (i.e., 
b*).  The  extrinsic  errors  are  also  proportional 
to  the  computed  result  1121,  with  an  exception  of 
subtraction  of  nearly  equal  quantities,  which  may 
cause  a  blowup  of  the  relative  error.  For  the 
inherent  error  (also  called  accumulated  error) 
Wilkinson  [16]  (also  Forsythe  and  Holer  (2|)  gives 
errorbounds  which  are  proportional  to  the  computed 
result.  If  y  is  the  exact  result  of  a  combination 
of  n  multiplications  and  divisions,  then  the  rela¬ 
tive  error  in  the  computed  result  is  bounded  by 
ntQ  for  some  YQ.  Hany  other  bounds  on  the 
rounding  errors  in  algebraic  processes  are  also  of 
the  form  f(n).  The  linear  (in  n)  bound  is  rather 
conservative,  for  the  individual  rounding  errors 
in  a  compounded  expression  tend  to  cancel  rather 
than  to  reinforce  each  other  if  an  unbiased 
rounding  rule  is  used.  With  biased  rounding  and 
truncation,  the  bound  may  be  more  realistic. 

Because  of  the  above  observations,  we  are  led 
to  a  stochastic  model  for  the  finite  wordlength 
error. 

y  -  Qly)  -  Y(n) yc  (5) 

where  t  is  a  sample  of  a  standard  white  Gaussian 
process  and  Y(n)  is  a  normalization  factor, 
dependent  on  the  number  of  operations.  The 
'large*  samples  c  simulate  then  the  occasional 
blow  up  due  to  subtraction  of  near  equal  quanti¬ 
ties  (occurring  with  empirical  frequency  .14  [4)). 
It  can  be  shown  (by  considering  error  accumulation 
in  one  single  batch  of  n2  or  n  batches  of  n  opera¬ 
tions)  that  for  consistency  of  (1)  Y(n)  must  be 
order  /n. 

Remark  i  The  above  approximation  (5)  will  be 
invalid  if  the  number  of  binary  operations  greatly 
exceeds  the  number  of  independent  variables  occur¬ 
ring  as  operands. 

}.  ANALYSIS  OP  LI  HEAR  RECURSIONS 

The  formulas  ( 1 )  are  generic  state  space 
representations  for  digital  filters  or  com¬ 
pensators  as  for  Instance  used  in  feedback 
controllers.  The  signals  u^  and  yk  are  respec¬ 
tively  the  input  and  the  output  vectors.  As 
explained  in  the  previous  section,  the  bilinear 
error  model  is  assumed,  at  well  as  a  perfect 
representation  of  the  parameter  matrices  A,  B, 
and  C.  (This  entails  no  lots  of  generality  since 
the  effect  of  parameter  truncation  can  always  be 
•thrown  back'  to  the  data  (12).  By  equation  IS), 
the  recursions  in  floating  point  can  be  modeled  by 
(assuming  the  use  of  an  unbiased  rounding). 

xk+1  “  hx™  ♦  Bu  ♦  Y  diag  (ax"*Bu  )  0  w  16) 


»  ,-w  V  „>„S  .V 


} 


y™  *  c*“  ♦  8  dlag  (Cx™)  DjV^  (7) 

wh«r«  •  v^)  In  an  (n  ♦  p)  dimensional  standard 
whit*  gaussian  sequence,  8  and  7  «r»  normalization 
parameters  which  «r«  purely  h*rdw*r*  dependant, 
while  the  elements  of  the  metric**  0  ere  realiza¬ 
tion  dependent,  and  reflect  dependanciea  among  the 
computed  ttate  or  output  component*.  (If  two  com¬ 
ponent*  of  the  x-vector  are  updated  in  Identical 
ways,  than  their  error*  muat  alao  remain  equal.) 

8  and  7  are  fixed  euch  that  the  maximal  sum  of 
aquare*  of  the  element*  for  each  row  of  w  la  on*. 
If  A  1*  a  full  matrix  without  any  particular 
atructure,  then  9*narically  on*  can  aet  M  «  I,  and 
7  correaponda  with  (n  ♦  m)  multiplication*  and  n  ♦ 
m  -  t  (signed)  addition*.  A  apeclal  caee  occur* 
for  inatanc*  if  a  pair  (A,b)  1*  in  canonical  form, 
1.*.,  for 
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Por  truncation  or  blaaed  rounding,  the  probabil¬ 
istic  model  needs  to  be  adjusted  to  incorporate 
the  bias.  Por  each  type  of  arithmetic,  a  bilinear 
state  modal  arises.  Por  this  reason  wo  shall 
refer  to  the  floating  point  error  model  in  the 
previous  section  as  the  ’bilinear  error  ax>d*l.’ 


The  propagation  of  the  expected  value  of  the 
above  model  state  is 


"is 

*♦1 


Ax. 


®U. 


(10) 


Clearly,  if  u^  •  uk  and  x"  •  x",  than  the  solution 
of  (<0)  and  (1)  are  Identical.  Therefore,  the 
exact  recursion  ( 1 )  can  be  interpreted  as  the 
expectation  of  the  floating  point  model. 
Subtracting  ( 1 0  J,  from  (8),  the  floating  point 
error  x“  •  xB  -  xm  satisfies 


•  (I  ♦  7  diag  (Dw^)]  Ax™  ♦  7  dlag  (R<k)x™t) 


(11) 


A  criterion  for  almost  sure  stability  can  be 
established  for  first  order  systems  based  on 
Crlntsevlchyu*1  theorem.  Por  higher  order  system* 
(the  case  of  Interest),  w*  shall  only  b«  concerned 
with  the  first  and  second  moments.  The  following 
properties  were  derived i 

Theorem  1 1  Por  the  bilinear  model  (6)-(11),  the 
error  covariances 

p™  a  i{ («".;-)• 

v™a  e(y0V,)(yS“)*  ,U) 


Ci  *  AC’  ♦  y2°i°;  *  KA,*£kei 


x  •  Kx  ♦  Bu 
*k*l  k  k 


Ci  *  cpkc‘ +  “Vi  *  c(CCc 


(  13) 

(14) 

(15) 

(16) 


A  proof  Is  straightforward  by  ’aquaring  up*  (11) 
and  taking  expectations,  noting  that  w^  is  inde¬ 
pendent  from  Xk  and  x^.  Plnally  the  Identity 


dlag (x)  0  diag(x)  •  xx'  •  Q  (171 

is  usod  where  *  is  the  Schur  product  (i.e., 
(A’B)tJ  -  A1}Bl3). 

Remarks i 


1.  If  ths  input  U|(  is  purely  random  with  zero 
mean  and  covariance  Q^.  then 

Cl  •  ♦  ■V'  (,8) 

is  substituted  for  (16)  and  (17).  The 
relative  error  is  then  the  ’ratio*  p^'  , 
where  Vk  •  Exkx£, 

2.  Defining  B  as  the  unit  under  *,  i.e.,  E^j  •  l 
w*  can  rewrite  (15)  as 

C)  *  Ce*T2D0>  )  •  AP™A*  ♦  72DD'  •  Ik+)  ( 13  ’ ) 

It  is  clear  that  even  when  A  is  strictly  stable, 
p£  may  grow  unboundedly,  due  to  the  presence  of 
the  positive  Schur  factor  E  ♦  72dd‘. 

Theorem  2i  The  modal  (6)  is  second  order  stochas¬ 
tically  stable  if  the  eigenvalues  of  A  are  within 
a  circle  with  radius  (1  ♦  72)-’^2. 


Another  measure  of  the  similarity  of  computed 
result  and  exact  result  of  the  recursion  is  given 
by  the  generalized  correlation  coefficient, 
between  the  output*  of  the  exact  (y)  and  the 
finite  precision  system  (y*).  w*  define  this 
generalized  correlation  coefficient  as 

pf  -  llm  Pu(y.yf)  ( 19) 

N—  N 


.  ,  f,  Tr(Y(Y  )M 

My<y  )  -  - ; — ( — -777 
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(  20) 


where  Y  is  the  data  matrix  (y t > • ■ • >yN) ,  and 
similarly  for  Y(.  Note  that  P((ly<y‘f  can  be  writ¬ 
ten  in  terms  of  th*  sample  correlation  function* 
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i  Tr{  £  y.y!)  ±  Tr(  £  y[(y[)'}  (21) 


assuming  a  random  input  with  variance  Qjj,  and 
invoking  ergodicity.  For  the  model  (6)-(1l),  the 
Pn^'F^  can  b®  precomputed. 

Theorem  1:  The  steady  state  correlation  between 
the  model  output  y®  and  y(»  y®  for  Y  “  8  “  0)  is 


,  /  Tr{ClI  C* } 

p“ - 1 -  /  - —  (22) 

/T7?/ 

where  n®  and  lt„  are  the  solutions  to  the 
(extended)  Lyapunov  equation 
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for  o  ■  Y2  and  a  «  o,  respectively. 

Note  that  the  quantity  p®  can  easily  be 
interpreted  in  terms  of  a  signal  to  (computation) 
noise  ratio.  Based  on  the  bilinear  model  we  can 
now  try  to  find  special  realizations  for  which  the 
error  covariance  is  minimal  or  the  correlation 
coefficient  is  maximal.  It  was  found  by  simula¬ 
tion  (DD‘  -  I)  that  many  equivalent  optima  exists. 
In  fact,  the  error  measures  fluctuate  rapidly 
between  a  minimum  and  maxim  us  value.  This  high 
sensitivity  may  make  an  optimal  realization 
impractical.  We  established  also  the  (expected). 

Theorem  4 »  If  DD'  “  I,  then  scaling  (i.e.,  a 
diagonal  similarity  transformation)  leaves  the 
error  properties  invariant. 

Theorem  5i  For  a  given  realization  (A, 8,0,  the 
error  properties  are  left  invariant  if  Q  is  multi¬ 
plied  by  a  positive  constant. 

Remark i  He  have  not  touched  upon  certain  impor¬ 
tant  and  interesting  issues.  If  very  low  preci¬ 
sion  is  used,  it  can  be  shown  that  trapstates 
may  exist.  These  are  vectors  of  floating  point 
numbers  such  that  Qlx^j)  «  Q(x  ]  for  the  undrlven 
system.  Obviously,  zero  will  be  a  trapstate,  but 
many  more  My  exist,  depending  on  the  iteration 
and  the  precision.  vtxenver  nonzero  trapstates 
exist,  the  bilinear  modal  will  break  down.  The 
details  are  under  study. 
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A  BILINEAR  K3DSL  FOR  LINEAR  RECURSIVE  COMPUTATIONS 
□SING  FLOATING  POINT  ARITHMETIC 
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A  stochastic  error  model  for  floating  point 
arithmetic  is  developed.  A  characteristic  fea¬ 
ture  of  the  floating  point  error  in  an  operation 
is  that  its  bounds  are  proportional  to  the  result 
of  the  operation.  This  model  is  used  to  study 
the  effects  of  finite  wordlength  on  linear  recur¬ 
sive  formulas.  Optimal  realizations  exist,  but 
they  are  highly  sensitive. 


1 .  INTRODUCTION 


The  accumulation  and  roundoff  error  in  long 
computerized  calculations  and  recursive  algo¬ 
rithms  is  a  phenomenon  that  can  destroy  an  effi¬ 
cient  and  sound  computational  procedure  based  on 
arithmetic  over  the  real  number  field.  A  tell¬ 
tale  example  is  the  Kalman  filter  divergence. 
Analytically  this  state  estimator  yields  the 
estimate  with  minimal  covariance,  but  the  actual 
error  covariance  may  be  much  larger  than  the 
predicted  covariance  (which  solves  a  certain 
Rlccatl  equation)  or  even  grow  unboudedly  when¬ 
ever  a  finite  precision  version  of  the  algorithm 
is  Implemented. 


It  is  clear  that  in  order  to  keep  track  of 
the  confidence  in  the  computed  results,  a  certain 
measure  of  confidence  should'  be  computed  and 
'tracked*  with  each  operation  Ob'  update. 


One  such  measure  is  the  rigorous  computation 
of  error  bounds  via  interval  analysis.  Not  only 
are  the  additional  computations  that  are  required 
time  consuming,  but  the  obtained  results  may  be 
overly  conservative.  The  individual  rounding 
errors  in  a  compounded  expression  Indeed  tend  to 
cancel  rather  than  to  reinforce  each  other  if  an 
unbiased  rounding  rule  is  used.  In  this  paper 
another  measure  of  confidence  is  used.  It  is  of 
a  more  probabilistic  nature  and  estimates  the 
propagated  covariance  due  to  the  finite  word- 
length  errors.  Such  a  study  is  standard  (and 
straightforward)  for  fixed  point  arithmetic,  and 
is  well  described  in  several  textbooks,  culmi¬ 
nating  in  the  optimal  filter  Implementations  by 
Mull is  and  Roberts  (8)  and  the  LQG  compensator  by 
Horoney  [7).  One  of  the  serious  shortcomings  of 
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the  use  of  fixed  point  arithmetic  is  the  neces¬ 
sity  of  scaling  in  order  to  provide  a  higher 
accuracy.  This  disadvantage  is  obliterated  when 
floating  point  arithmetic  is  used. 


Modern  digital  technology  has  rapidly 
Increased  the  speed  of  floating  point  processors. 
As  a  result,  these  modules  are  Increasingly 
introduced  in  real  time  applications  of  estima¬ 
tion,  control,  digital  filtering,  and  general 
signal  processing,  and  the  need  for  a  comprehen¬ 
sive  analysis  of  its  limitations  (due  to  finite 
wordlength  effects)  is  obvious.  No  fully  compre¬ 
hensive  model  for  the  wordlength  effects  in 
floating  point  exists,  although  some  very  signif¬ 
icant  contributions  have  been  made  in  the  past. 
Attempts  to  give  rigorous  analysis  of  a  sequence 
of  floating  point  operations  have  proven  to  be  so 
formidable  that  one  has  to  content  one's  self 
with  plausibility  arguments  (e.g.,  see  [4], 
p.  213).  A  noteworthy  contribution  is  the 
axiomatlzation  by  Kullsh  (5). 


General  statistical  modeling  of  floating 
point  errors  relies  on  the  work  of  Wilkinson 
118),  Forsythe  and  Holer  |2],  Stewart  [11),  Knuth 
(4),  and  Vandergraft  [12).  Brent  [11,  Harass  and 
Matula  [6]  performed  extensive  simulations  for 
various  finite  precision  arithmetic  systems. 
An  analysis  of  the  effects  in  digital  filtering 
is  due  to  Kan  and  Aggarwal  [3)  and  others.  Rink 
and  Chong  analyzed  the  performance  of  a  floating 
point  stats  regulator  [ 9 1 .  Van  Wlngerden  and  De 
Konlng  [13]  recently  combined  this  work  with  a 
Monts  Carlo  identification  technique.  A  dynam¬ 
ical  stochastic  model  was  used  by  Verrlest  1 14) 
in  the  computation  of  a  gain  correction  for 
(Kalman)  filtering  applications  in  floating 
point.  In  this  paper,  the  finite  wordlength 
effects  on  discrete  linear  recursions  of  the  form 
(dim  x  •  n,  dim  u  •  m,  dim  y  •  p) 


x,  ,  ■  Ax,  +  Bu, 
k+1  k  k 


are  analyzed.  The  paper  is  organized  as  follows. 
First  a  model  for  the  floating  point  errors  is 
discussed.  The  third  section  then  uses  this 
error  model  to  obtain  representations  for  various 
confidence  measures. 


2.  THE  ERROR  MODEL 


We  shall  work  with  normalized  floating  point 
numbers.  For  a  given  base  b,  they  are  expressed 


-  -.-v.  ovv 


by  two  numbers,  e  and  f.  Here  e  Is  an  Integer 
exponent  and  f  la  a  signed  fraction  assumed  to  be 
normalised,  l.e.,  b-'  <  Ifl  <  1.  The  value  of 
the  floating  point  number  is  then 

fbe 

Since  the  coding  of  the  exponent  e  and  the  signed 
fraction  must  fit  Into  a  given  wordlength  (w)  , 
there  will  be  a  tradeoff  between  the  precision 
and  the  range  of  the  representable  numbers  in  the 
computer  (1).  However,  in  this  work  we  shall 
assume  an  infinite  range  and  only  consider  the 
effects  of  a  finite  number  of  digits  In  the 
fraction.  Besides  simplifying  the  analysis,  this 
assumption  can  be  substantiated  by  the  fact  that 
normally  an  underflow  or  overflow  would  cause 
program  termination,  and  we  are  only  interested 
in  the  finite  wordlength  effects  during  a  normal 
program  execution.  Following  Kullsh  { S |  ,  the 
situation  is  as  follows.  We  have  R,  the  set  of 
reals  and  an  operation  *  (which  Is  *,  -,  x,  or 
t)  .On  the  computer  the  elements  of  R,  as  well  as 
the  results  of  a  *  b  are  not  exactly  represent¬ 
able.  Hence  the  reals  must  be  mapped  in  a  subset 
F  according  to  a  proper  (l.e.,  monotone  and 
symmetric)  mapping  QiR  ♦  F.  The  approximation  of 
the  •  operation  is  then  Q[*,*,*) 


these  errors  propagate  through  the  recursion  and 
hence  accumulate.  They  are  called  the  inherent 
errors  since  they  inherit  their  properties  from 
the  operation  sequence  and  the  given  recursion. 

The  intrinsic  errors  are  bounded  by  the 
"unit  in  the  least  significant  digit*  times  be  . 
The  error  is  therefore  uniform  in  an  Interval 
with  length  proportional  to  the  number  itself 
(l.e.,  b*) .  The  extrinsic  errors  are  also  pro¬ 
portional  to  the  computed  result  [12],  with  an 
exception  of  subtraction  of  nearly  equal  quanti¬ 
ties,  which  may  cause  a  blowup  of  the  relative 
error.  For  the  Inherent  error  (also  called 
accumulated  error)  Wilkinson  (16)  (also  Forsythe 
and  Holer  (2))  gives  errorbounds  which  are  pro¬ 
portional  to  the  computed  result.  If  y  is  the 
exact  result  of  a  combination  of  n  multiplica¬ 
tions  and  divisions,  then  the  relative  error  in 
the  computed  result  is  bounded  by  nYQ  for  some 
T  .  Many  other  bounds  on  the  rounding  errors  in 
algebraic  processes  are  also  of  the  form  f  (n)  . 
The  linear  (in  n)  bound  is  rather  conservative, 
for  the  individual  rounding  errors  in  a  com¬ 
pounded  expression  tend  to  cancel  rather  than  to 
reinforce  each  other  if  an  unbiased  rounding  rule 
is  used.  With  biased  rounding  and  truncation, 
the  bound  may  be  more  realistic. 


Ql*l  a,b]  i-  Q(a*b)  (2) 

Unfortunately,  the  in  general  not  representable 
result  a  *  b  seems  to  be  necessary  for  Its  reali¬ 
zation.  It  can  be  shown,  however,  that  in  all 
cases  where  a  •  b  la  not  exactly  representable, 
it  la  sufficient  to  replace  it  by  an  appropriate 
and  representable  value  a  *  b  with  the  property 

Q(a*b)  -  Q(a*b)  (3) 

Then  the  proper  definition  is 

Q[«l  a,b]  i«  Q(a*b)  (4) 

The  concrete  algorithms  for  the  realization  of 

this  formula  can  then  be  decomposed  into  four 
steps. 

1.  Identification  of  the  exponent  and  frac¬ 
tion  of  a  and  b. 

2.  execution  of  a*b. 

3.  Renormalization. 

4.  Mapping  into  F  (because  accumulation  of 
higher  wordlength  may  be  used) . 

The  cause  of  floating  point  errors  is  three¬ 
fold.  First,  there  are  intr inslc  errors,  due  to 
finite  wordlength  representation  of  a  given 
number  (parameters,  inputs...).  Even  if  two 
numbers  have  an  exact  representation,  binary 
operations  (sum,  product...)  on  them  may  require 
a  longer  wordlength  for  exact  representation. 
Hardware  implementation  greatly  affects  this 
error  (presence  or  absence  of  guardbits,  double 
register  arithmetic,  rounding  or  truncation...). 
The  errors  Induced  by  binary  operations  are 
referred  to  as  extrinsic  errors.  Finally  all 


Because  of  the  above  observations,  we  are 
led  to  a  stochastic  model  for  the  finite  word- 
length  error. 

y  -  Qly)  *  Y(n)ye  (5) 

where  e  is  a  sample  of  a  standard  white  Gaussian 
process  and  Tin)  is  a  normalization  factor, 
dependent  on  the  number  of  operations.  The 
•large*  samples  c  simulate  then  the  occasional 
blow  up  due  to  subtraction  of  near  equal  quanti¬ 
ties  (occurring  with  empirical  frequency  .14 
(4)).  It  can  be  shown  (by  considering  error 
accumulation  in  one  single  batch  of  n^  or  n 
batches  of  n  operations)  that  for  consistency 
of  (1)  Y(n)  must  be  order  /n . 

Remark  i  The  above  approximation  (3)  will  be 
invalid  if  the  number  of  binary  operations 
greatly  exceeds  the  number  of  independent 
variables  occurring  as  operands.  In  the  case  of 
an  equal  mixture  of  ♦  ,  -,  x,  and  ',  Harass  and 
Matula  (6)  have  shown  by  extensive  simulation  on 
combinations  of  10J  operations,  that  the  relative 
error  grows  slightly  faster  than  exponentially. 

3.  ANALYSIS  OF  LINEAR  RECURSIONS 

The  formulas  (1)  are  generic  state  space 
representations  for  digital  filters  or  compen¬ 
sators  as  for  Instance  used  in  feedback  control¬ 
lers.  The  signals  u^  and  y^  are  respectively  the 
input  and  the  output  vectors.  As  explained  in 
the  previous  section,  the  bilinear  error  model 
is  assumed,  as  well  as  a  perfect  representation 
of  the  parameter  matrices  A,  R,  and  C.  (This 
entails  no  loss  of  generality  since  the  effect 
of  parameter  truncation  can  always  be  ’thrown 
back*  to  the  data  (12).  By  equation  (5),  the 


■- '  N.-J 


recursions  in  floating  point  can  ba  modeled  by 
(assuming  the  use  of  an  unbiased  rounding) . 

*X*1  *  Ax"  *  Buk  *  T  dta9  (Ax"+Buk)  °lWk  (6> 


y™  *  cx™  ♦  8  diag  (Cx*)  D^  (7) 

where  wk  «  v')  in  an  (n  *  p)  dimensional  standard 
white  gausslan  sequence,  B  and  Y  are  normaliza¬ 
tion  parameters  which  are  purely  hardware  depen¬ 
dent,  while  the  elements  of  the  matrices  D  are 
realization  dependent,  and  reflect  dependencies 
among  the  computed  state  or  output  components. 
(If  two  components  of  the  x-vector  are  updated 
in  identical  ways,  then  their  errors  must  also 
remain  equal.)  B  and  Y  are  fixed  such  that  the 
maximal  sum  of  squares  of  the  elements  for  each 
row  of  W  is  one.  If  A  is  a  full  matrix  without 
anv  particular  structure,  then  generlcally  one 
can  set  W  •  I,  and  Y  corresponds  with  (n  ♦  ra) 
multiplications  and  n  *  m  -  1  (signed)  additions. 
A  special  case  occurs  for  Instance  if  a  pair 
(A,b)  is  in  canonical  form,  l.e.,  for 


9  <»> 


‘  t  o  o 

D  «  0  (9) 

.° 

Por  truncation  or  biased  rounding,  the  probabil¬ 
istic  model  needs  to  be  adjusted  to  Incorporate 
the  bias,  e.g.  for  truncation 

Q  (x]  *  x  ♦  Y  I  xl  (w  -  \  sgn  (x)  ) 

C  C  i 

“  (’  ~  7  Y„)x  ♦  Y  xv '  (5') 

4  C  C 

where  we  set  w'  «  w  sgn(x).  Por  each  type  of 
arithmetic,  a  bilinear  state  model  arises.  Por 
this  reason  we  shall  refer  to  the  floating  point 
error  model  in  the  previous  section  as  the 
’bilinear  error  model.* 

The  propagation  of  the  expected  value  of  the 
above  model  state  la 

Vtl  “  *  Bu*  (10! 

Clearly,  if  uk  •  uk  and  xjj  ■  xj,  then  the  solu¬ 
tion  of  (10)  and  (1)  are  identical.  Therefore, 
the  exact  recursion  (1)  can  be  interpreted  aa 
the  expectation  of  the  floating  point  model. 
Subtracting  (I0)s  from  (6),  the  floating  point 
error  x*  •  x"  -  x1*  satlafiea 

x"+)  •  [I  ♦  Y  dlag  (Dw^)  ]  Ax"  ♦  y  dlag  (Dw^lx"^ 


A  criterion  for  almost  sure  stability  can  be 
established  for  first  order  systems  based  on 
Gr intsevlchyus '  theorem  [17, p. 153)  for  st&ady- 

.  .  bu^ 

state  conditions  (l.e.,  if  x  ♦  x_  •  - — j. 

Namely  if  a 

-«  <  8  log|a(1  ♦  Yw  )  I  <0 

then  the  error  xra  is  a.s.  convergent  for  all  k 
iff 

8  log  eup(Yt x^w^ |  ,  1  )  <  - 

l.e.,  if  lal  <  f  ( Y)  for  some  function  f  of  Y. 
Por  higher  order  systems  (the  case  of  Interest) , 
we  shall  only  be  concerned  with  the  first  and 
second  moments.  The  following  properties  were 
derivedi 

Theorem  1i  Por  the  bilinear  model  (6)— (11),  the 
error  covariances 


pn  $  e  (xvj  (*"-,:*)• 
v*  -  E  (y "-y" )  (y "-y™ ) ' 


solve  the  recursion 


ap"a‘  ♦  Y2D,D'  •  (ap"a*  +  I 


l  •  Ex™  fx™ ) ' 
k  k1-  kJ 


jc*  *  Ax™  ♦  Ba 
\+\  k  uk 


-  cp"c  '  ♦  b2d2o^  *  c(p"^)c' 


A  proof  is  straightforward  by  'squaring  up*  (11) 
and  taking  expectations,  noting  that  wk  is  inde¬ 
pendent  from  xk  and  xk .  Finally  the  identity 

dlag(x)  Q  diag(x)  •  xx1  •  Q  (17) 

is  used  where  *  is  the  Schur  product  (i.e., 
lA*B)ij  -  AtjBij)  . 

Remarksi 

1.  If  the  input  uk  is  purely  random  with  zero 
mean  and  covariance  Qk,  then 

Ck  +  )  ’  A^A1  ♦  BQkB'  (18) 

is  substituted  for  (16)  and  (17).  The  rela¬ 
tive  error  is  then  the  'ratio*  P™v'  ,  where 

Vk  ’  *»k«k- 

2.  Defining  8  as  the  unit  under  *,  i.e.,  • 

1  we  can  rewrite  (13)  as 

p"  •  (e+Y2DD'J  •  ApV  ♦  Y2DD'  *  1  ('3') 

k+1  '  k  <*1 

It  is  clear  that  even  when  A  is  strictly  stable, 
p£  may  grow  unboundedly,  due  to  the  presence  of 
the  positive  Schur  factor  8  ♦  y200'. 
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Theorem  2:  The  model  (6)  Is  second  order  sto¬ 
chastically  stable  If  the  eigenvalues  of  A  are 
within  a  circle  with  radius  (1  ♦  Y2)"'/2. 

Another  measure  of  the  similarity  of  com¬ 
puted  result  and  exact  result  of  the  recursion  Is 
given  by  the  generalized  correlation  coefficient, 
between  the  outputs  of  the  exact  (y)  and  the 
finite  precision  system  (y^l  .  We  define  this 
generalized  correlation  coefficient  as 

Pf  -  lira  P  ly,yf)  (19! 

M-  N 


t>N{y.y  ) 


TrlY(Y  ) ’) _ 

( Tr  ( YY  ‘  )Tr  { Y^  ( Y^ ) '))1/2 


where  Y  Is  the  data  matrix  (y | <Yj < • • • <YN ) »  and 
similarly  for  Y^.  Note  that  Pjg ty , y e )  can  be 
written  In  terms  of  the  sample  correlation 
f unct ions 


eN(y.y  ) 


jlrl  t  y.  (y[) ' ) 

N  l-i  1  1 


jjf  Tt  {  z  y.yj)  :r  Tr  (  I  y5 Cyf )  ’ >  (21 ) 
N  1-1  1  1  N  1-1  1  1 


assuming  a  random  Input  with  variance  Q* ,  and 
Invoking  ergodlclty.  For  the  model  (6) -(11),  the 
PN^y>ymi  c»n  be  precomputed. 

Theorem  3i  The  steady  state  correlation  between 
the  model  output  y™  and  y(«  yra  for  Y  •  8  •  0)  Is 


.  /Tr(cn  c') 

0  ■  ~  "  / - - —  <J2) 

/,  +  fl2  '  Tr(cn™c ) 

where  n"  and  are  the  solutions  to  the 
(extended)  Lyapunov  equation 

X  -  (AXA 1 +BQB  * )  *  (E+OD(Dj)  (23) 

2  • 

for  d  •  Y  and  o  -  0,  respectively. 

Note  that  the  quantity  p™  can  easily  be 
Interpreted  In  terms  of  a  signal  to  (computation) 
noise  ratio.  Based  on  the  bilinear  model  we  can 
now  try  to  find  special  realizations  for  which 
the  error  covariance  is  minimal  or  the  correla¬ 
tion  coefficient  Is  maximal.  tt  was  found  by 
simulation  (DO1  •  I)  that  many  equivalent  optima 
exists.  tn  fact,  the  error  measures  fluctuate 
rapidly  between  a  minimum  and  maximum  value. 
This  high  sensitivity  may  make  an  optimal  reali¬ 
zation  Impractical.  We  established  also  the 
(expected) . 

Theocem  4i  If  DO'  ■  I,  then  scaling  (l.e.,  a 
diagonal  similarity  transformation)  leaves  the 
error  properties  Invariant. 


Theorem  5:  For  a  given  realization  (A,B,C),  the 
error  properties  are  left  Invariant  if  Q  is 
multiplied  by  a  positive  constant. 

Remark!  We  have  not  touched  upon  certain  Impor¬ 
tant  and  Interesting  Issues.  If  very  low  preci¬ 
sion  Is  used,  It  can  be  shown  that  trapstates 
may  exist.  These  are  vectors  of  floating  point 
numbers  such  that  Qtx^,  )  *  Ql*x)  for  the 
undrlven  system.  Obviously,  zero  will  be  a 
trapstate,  but  many  more  may  exist,  depending  on 
the  Iteration  and  the  precision,  whenver  nonzero 
trapstates  exist,  the  bilinear  model  will  break 
down.  The  details  are  under  study. 

4.  FINAL  RZXAAXS  AND  CONCLUSIONS 

Simulation  of  the  bilinear  model  yields  an 
output  that  fluctuates  too  fast  as  compared  to 
the  output  of  a  simulation  of  the  roundoff 
effects  on  the  recursions.  This  leads  to  the 
conclusion  that  the  developed  error  model  Is  only 
good  to  provide  RMS  bounds  on  the  error.  The 
sample  paths  of  the  bilinear  model  (6-7)  are  In 
no  way  a  good  representation  of  an  actual  sample 
run  of  a  computer  with  small  word  length.  Our 
suggestion  Is  to  run  the  covariance  updates  along 
with  the  actual  recursion  (l.e.,  one  updates  all 
of  (11-16)).  A  one-o  or  three-o  confidence 

ellipse  can  then  be  constructed  around  the 
computed  update  yj,  based  on  the  matrix  V®. 

Concluding,  we  state  that  a  bilinear  error 
model  can  be  used  to  obtain  confidence  bounds  on 
computed  recursions.  These  bounds  are  less  con¬ 
servative  than  the  absolute  bounds  provided  by 
interval  arithmetic.  Based  on  this  model,  we 
have  shown  that  optimal  realizations  exist,  but 

are  too  sensitive  to  be  of  practical  value. 
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Abstract 


The  affects  of  the  finite  word  length  in  a 
floating  point  implementation  of  the  least  squares 
filter  is  discussed.  Optimal  precomputable  gains 
are  given,  and  a  computationally  sore  attractive 
approximation  la  given. 

1 .  INTRODUCTION 

Optimality  of  the  Kalman  filter  is  only  guaran¬ 
teed  if  computations  can  be  performed  in  infinite 
precision.  Therefore  the  realistically  computed 
estimates  are  no  longer  optimal.  This  paper  takas 
the  finite  word  length  constraints  of  the  digital 
machine  into  account  in  the  algorithm  design. 

finite  word  length  effects  in  fixed  point  are 
now  well  understood  and  described  in  several  books, 
culminating  in  the  optimise  filter  implementations  by 
Hull is  and  Roberts  (1J  and  the  compensator  by 
Moroney  ( 2  ] . 

General  statistical  modeling  of  floating  point 
errors  relies  on  the  work  of  Wilkinson  (3],  Knuth 
(4]  and  Vandergraft  ( S ] .  The  effects  on  the  per¬ 
formance  of  compensators  have  been  studied  [8,7],  A 
gain  adjustment  for  the  filter  was  given  by  this 
author  in  (8). 

2.  THE  ERROR  MODEL 

The  cause  of  floating  point  errors  is  three¬ 
fold.  First,  there  are  ‘intrinsic*  errors,  due  to 
finite  wordlength  representation  of  a  given  number 
(parameters,  inputs...).  Even  if  two  numbers  have 
an  exact  representation,  binary  .  operations  (sum, 
product...)  on  them  may  require  a 'longer  wordlength 
for  exact  representation.  Hardware  implementation 
greatly  affects  this  error  (presence  or  absence  of 
guardblta,  double  register  arithmetic,  rounding  or 
truncation...).  The  errors  Induced  by  binary  opera¬ 
tions  are  referred  to  as  ‘extrinsic’  errors. 
Finally  all  these  errors  propagate  through  the  re¬ 
cursion  and  hence  accumulate.  They  are  called  the 
‘inherent*  errors  since  they  inherit  their  proper¬ 
ties  from  the  operation  sequence  and  the  given  re¬ 
cursion  . 

A  characteristic  feature  of  the  floating  point 
errors  la  that  they  are  proportional  to  the  casputed 
results  (l.e.  one  has  ‘multiplicative  noise*).  If 
Q(yk)  represents  the  result  of  a  recursive  computa¬ 
tion  of  a  vector  y*,  then  the  model  of  the  computa¬ 
tion  error  is 

yk  *  °'yk'  *  Y  du,(yk|Wtk  <’) 

where  t  is  a  sample  of  an  n-dlmenslonal  standard 
white  Gaussian  sequence,  T  is  a  parameter  which  is 


purely  hardware  dependent  and  the  elements  of  w 
depend  on  the  particular  recursion  (dependencies  of 
components  of  y^) . 

The  rest  of  the  paper  deals  with  the  filter  for 
the  rodel 

■  Fxk  ♦  Guu 


yk  *  ^k  +  vk 
(u’,v')‘  *  n((  0, 0) ,  (?  ?)) 


3.  THE  OPTIMAL  FILTER  FOR  DEGRADED  ARITHMETIC 

First  it  will  be  assisned  that  all  gains  can  be 
precomputed  and  set  in  infinite  precision.  This  not 
only  simplifies  the  analysis,  but  can  alio  be  justi¬ 
fied.  The  error  due  to  the  difference  in  the 
desired  and  Implemented  gain  can  be  ‘thrown*  back  to 
the  data,  a  technique  known  as  Inverse  error  anal¬ 
ysis.  Hanes  without  loss  of  generality  the  (actual) 
computed  filter  update  (with  gain  K^)  is  modeled  by 
(suppressing  the  *0*-notation) 


V)  "  ^k  *  Vyk 


■v  ♦  *k 


where  )k  is  the  finite  wordlength  error.  The 
part  Ex,  ♦  *k(yk  -  Hx  )  which  is  the  desired  or 
theoretical  update  wiir  be  defined  as  the  .CORE  of 
the  estimate.  The  estimation  error  x  -  x-x  satis¬ 
fies  then 

Vi  *  *k  *  cuk  -  \%  -  Vk  -  \  15 1 

The  covariances  P  *  Exx'  and  l  •  Exx '  satisfy  the 
coupled  matrix  equations 

Vi  -  rv  ♦  vv  ♦  rckH'Ki +  \RkKi * 

Vi  ■  *V  *  KkHV’  *  rckH’Kk  *  VkKk  •  E!V£’ 

Vl  “  *  CQG'  *  VkKk  *  B(Vk’ 

where  R^  is  as  usual  the  innovations  covariance 

•  HP^H’  ♦  R  (9) 

Appropriate  initial  conditions  are 

I  •  0,  C  m  0,  P  -  f 
o  o  o  0 

For  fixed  point  arithmetic,  the  driving  term  in  (6)- 
(9)  is 

E(Vk’  "  OO) 


laiklis 


.F.V.V 


A  v.v.v.v 
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and  chut  not  only  constant,  but  also  indapandant  of 
c,  p  and  I.  The  optimal  gain  can  than  ba  found 
directly  by  minimization  with  respect  to  K^. 

\mFPkH‘\' 

However,  genetically,  tha  coaputed  estimate  la  no 
longer  orthogonal  to  its  error. 

The  situation  la  different  under  floating  point 
arithmetic.  In  this  cate  the  atatiatica  of  the 
computation  noise  depend  on  the  oore  estimate.  As 
seen  in  section  2 

♦k  -  Y  diaglP^  ♦  K^iy^  -  tt^JlWw^  M2) 

Tha  resulting  driving  term  in  (S)  is  the  expectation 
(*  is  the  Schur  product) 

■‘Vi’  "  1,3 (L)t  *  WWM  <UI 

where  is  the  core  update  of  I,  i.e. 

Lj^  -  rl^f  *  K^HC^P'  ♦  FC^'K^  ♦  04) 

This  couples  (8)  to  ( 6 )  and  (7).  The  optimal  filter 
can  be  derived  via  an  equivalent  oonstrained  minimi¬ 
zation  problem  of  the  final  covariance 

♦  5  Elx-xl7  -  Tr(P  )  (IS) 

N  N  N 

with  respect  to  the  sequence  (k.),  end  subject  to 
( 6 )-(8 ) ,  (13)  and  (14). 

Adjoining  the  constraints,  the  Hamiltonian  for 

the  system  it 

“k  “  TrW+1*k+1  *  Ak*1Ck+l  *  Ak*lCk+1  +  Ak+1tk+||6) 

where  the  right  hand  sides  of  (6)-(8)  are  actually 
substituted  for  the  Pk*  j ,  Ck+,  and  Ik  ( .  The 
Pci 

boundary  conditions  are  a  I,  e  0,  a  0  and 

the  generalized  Euler-Lagrange  equations  lead  after 
some  algebra  to  tha  (backward)  recursions 
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where  .  -  ,  . 

A  •  (A  -  A  -  Ac  ♦  A  )  •  WW*  «A*WW'  (201 

The  optimality  condition  2tl /8R  *  Oyields  finally 
the  gain  K 


Kopt. 
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-  «*£♦!  -  Cl  *  ^)PCklH'"ke  (2!) 

where  ■  1  ♦  t2(WW' )^. 

It  is  clear  that  the  potential  benefit  of  an 
adjustment  to  the  optimal  gain  la  grossly  offset  by 
the  computational  burden  in  solving  the  above  equa¬ 


tions.  Moreover,  a  two  point  boundary  value  problem 
(TPBVF)  needs  to  be  solved  since  the  forward  and  the 
beckwarg  equations  are  coupled.  The  optimal 
gains  R',t  as  computed  by  (21)  will  therefore  depend 
on  N  and  should  be  denoted  as  XT,  k  •  0,  ...,  N-1 . 
Hence  in  order  to  obtain  the  optimal  least  squares 
estimate  at  each  time  H,  a  new  series  of  gains  for 
k*0  to  k  »  N-l  needs  to  be  coaputed.  A  filter 
implementation  with  precomputed  geina  would  require 
lots  of  memory  (order  NJ). 

Rather  than  implementing  the  optimal  scheme,  a 
suboptimal  floating  point  correction  will  be  derived 
which  is  more  straightforward  to  realize.  Availabi¬ 
lity  of  the  optimal  solution  remains  beneficial  to 
analyze  the  performance  of  the  proposed  schemes. 

4.  SOB  OPTIMAL  FILTERS 

Tha  stepwise  optimization  formulas  of  section] 
are  used  to  obtain  a  minimum  error  covariance  P, 
within  the  structure.  The  one-step  gain  K  is  com¬ 
puted  (diag  WW1  •  I)  from  (23)  and  the  subscript  0 
simply  replaced  by  k.  Thus 
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(6)-(8)  and  (13)- 


S.  CONCLUSIONS 


Por  the  floating  point  implementation  of  linear 
filters  the  optimal  precomputable  gains  are  charac¬ 
terized  as  the  solution  to  a  nontrivial  TPBVP.  A 
computationally  more  tractable  approximation  is 
presented.  It  requires  more  computational  effort 
than  the  Kalman  filter,  but  offsets  the  degradation 
due  to  the  finite  wordlength.  This  disadvantage 
disappears  since  tha  gains  are  precomputable  as 
opposed  to  the  method  in  (8).  The  methods  and 
results  described  can  be  carried  over  to  the  regula¬ 
tor  as  well. 
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ABSTRACT 

It  has  been  shown  in  earlier  papers  that  the  canonical  correlation 
analysis  and  the  principal  component  analysis  as  applied  to  the  stochastic 
realization  problem  can  be  derived  under  the  unified  framework  of  the  RV- 
coefficlent.  This  RV  coefficient  also  provides  a  common  statistical  measure 
of  information  that  can  be  used  in  the  evaluation  and  comparison  of  the 
different  methods.  It  is  shown  here  that  this  RV-coefficient  has  a  very 
natural  interpretation  in  terms  of  the  information  structure  (and  not  just 
the  data)  of  the  problem  itself.  More  specifically,  it  can  be  derived  from 
the  generalized  probability  measures  (Gleason  measures)  defined  on  the 
propositional  system  associated  with  the  modeling  problem.  This  problem 
draws  some  close  analogy  to  the  foundations  of  modern  quantum  theory. 
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Introduction. 


A  fundamental  problem  in  modelling,  identification,  signal  processing, 
digital  filtering  and  cluster  analysis  is  that  of  finding  a  finite 
dimensional  Markovian  representation  of  a  stochastic  sequence  from  the 
covariance  information.  This  stochastic  realization  problem  has  been  widely 
discussed  [A,F,B,DP,AK,RV] .  Whenever  a  finite  set  of  real  data  is  gathered, 
all  processing  is  performed  over  finite  sets,  and  in  most  cases  an 
underlying  probabilistic  model  is  absent.  As  a  result,  covariances  must  be 
estimated  from  the  observed  time  series.  A  more  direct,  data  driven 
approach  is  favorable.  Moreover,  for  many  applications,  the  Markovian 
representation  or  state  space  model  may  be  too  complex,  due  to  a  high 
dimensionality,  thus  barring  efficient  computational  management.  This 
motivates  the  quest  for  lower  order  models,  and  a  common  measure  for  the 
evaluation  of  the  performance  of  the  different  modeling  approaches. 

Many  methods  exist  for  the  determination  of  a  stochastic 
realization.  Two  philosophies,  deeply  rooted  in  multivariate  statistical 
analysis,  are  singled  out.  Akaike  [A]  and  Faurre  [F],  among  others, 
developed  the  theory  based  on  the  information  interface  between  the  past  and 
the  future  of  a  time  series.  This  led  Desai  and  Pal  [DP]  to  an  algorithm 
for  obtaining  a  stochastic  realization  and  model  reduction  scheme,  based  on 
the  Canonical  Correlation  Analysis  (CCA).  Their  realizations  form  the 
counterpart  to  the  deterministic  balanced  realizations  of  Moore  [M] . 

Another  method  based  on  the  Karhunen-Loeve  Method  (KLM) ,  also  known  as  the 
Principal  Component  Analysis  (PCA)  has  been  proposed  by  Arun  and  Kung  [AK] . 

Ramos  and  Verriest  [RV]  unified  the  CCA  and  KLM  methods  by  showing 


that  they  are  both  special  cases  of  a  more  general  optimization  problem, 


using  the  RV-coefficient  Introduced  in  multivariate  analysis  by  Escouffier 


[Ej.  Given  two  zero-mean  random  vectors  x  and  y,  with  Cov 
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Che  RV-coeff icient  is  defined  by 


RV(x.y)  - 


^RxyV 

{Tr[Rxx2]Tr[Ryy2)ij 


;  where  R^  -  R^ 


It  was  shown  thac  chis  common  statistical  measure  of  information  provides  a 
rationale  for  drawing  inferences  about  the  performance  of  the  algorithm. 

It  further  unifies  the  exact  covariance  and  real  data  case  by  relating  this 
RV-coeff icient  to  certain  operators  in  a  tensor  product  space  G  ®  H  where  G 
and  H  are  separable  Hilbert  spaces  [V].  Here  G  is  the  base  space,  and  H  is 
respectively  1^(0,  B,m)  and  R  . 

In  this  paper,  the  geometry  of  the  stochastic  realization  problem,  both 
exact  and  approximate,  is  investigated,  and  it  is  linked  to  some  notions  in 
the  theoretical  foundations  of  quantum  mechanics.  More  precisely,  measures 
on  the  subspaces  of  a  Hilbert  space  are  introduced,  which  relate  to  the 
density  matrices  in  the  quantum  mechanical  context.  It  is  then  shown  that 
the  above  mentioned  RV-coeff icient  is  but  one  possible  measure.  Many  others 
can  be  defined,  leading  of  course  to  slightly  different  results.  We  will 
not  emphasize  the  algorithmic  solution  of  the  realization  problem,  to  which 
many  fine  researchers  have  substantially  contributed.  A  motivation  for  the 
use  of  the  RV  coefficient  in  the  stochastic  realization  problem  is 


presented. 


Philosophy  of  Stochastic  Realization  Theory 

As  originally  formulated,  the  stochastic  realization  theory  deals  with 
the  following  problem  (for  a  full  mathematical  statement,  see  e.g.  [ F ] ) : 
Given  a  process  with  covariance  sequence  R^,  find  a  Markovian  representation 
or  state  space  model  of  minimal  dimension,  that  generates  an  output  with  the 
given  covariance  function.  This  problem  presupposes  the  knowledge  of  the 
exact  covariance  sequence.  A  perhaps  more  realistic  formulation  is:  Given 
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an  observed  sample  path  of  a  stochastic  process,  find  a  realization  of 
minimal  dimension  which  generates  an  output  with  the  same  covariance  as  the 
observed  sequence.  There  are  thus  two  parts:  on  one  hand,  there  is  the 
realization  of  the  model  given  the  exact  covariance  sequence  of  the  given 
process.  This  will  be  referred  to  as  the  exact  problem.  There  also  is  the 
problem  of  determining  or  estimating  the  covariance  sequence  from  the 
observed  data  sequence.  Stochastic  realization  theory  is  therefore  a 
statistical  theory.  The  problem  with  this  is  that  many  observed  processes 
occur  only  once,  and  that  one  thus  can  hardly  speak  of  a  statistical  model 
with  repeated  sample  paths.  It  is  our  contention  to  reformulate  the 
realization  problem  from  the  point  of  view  of  observed  data.  In  order  to  do 
so,  we  rely  on  some  of  the  same  principles  that  guided  physicists  to  the 
logical  foundations  of  modern  quantum  mechanics  [P,BV,Gn,Gd,Va] ,  which  is 
founded  on  the  impossibility  of  certain  knowledge.  Although  we  will  not 
discuss  deterministic  realization  theory,  it  is  worth  mentioning  that  a 
similar  approach  can  be  taken  here,  leading  to  the  usual  Boolean  logic,  also 
pertinent  in  the  logical  foundations  of  classical  mechanics. 

The  Propositional  System. 

We  will  depart  slightly  from  the  usual  setup  of  the  problem.  First  of 
all,  only  the  observed  time  series  is  available  to  the  modeler.  Hence,  any 
notion  of  "state"  should  refer  somehow  to  the  observed  data  sequence  y^, 

y2 . yfc-  As  more  things  are  observed  with  time,  the  information  grows. 

More  possibilities  will  appear  ("splittings"  in  previously  unresolvable 
information).  We  shall  define  the  state  of  the  system  as  the  entity 
representing  the  maximal  information  that  possibly  can  be  given  about  the 
system.  In  order  to  build  this  up  axiomatically ,  from  "first  principles", 
we  are  led  to  consider  the  fundamental  idea  of  a  question,  defined  as 
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"every  experiment  leading  to  an  alternative  of  which  the  terms  are  only  Yes 
or  No."  (e.g.  "the  value  obtained  by  the  time  series  at  time  i  lies  in  the 
Borel  set  B").  All  information  one  has  about  the  system  is  captured  in  the 
totality  of  all  such  elementary  questions.  The  problem  is  that  there  are 
very  many  such  questions  to  ask,  and  unless  we  have  a  good  theory  about 
these  questions  and  the  relations  among  them,  this  framework  would  be  of  no 
value.  We  say  that  the  question  is  certain  (true)  if  the  truth  are 
falseness  of  its  answer  can  be  stated  with  absolute  certainty.  In  the 
situation  that  whenever  the  system  question  is  true,  we  have  the  property 
that  question  Q2  is  always  true,  we  say  implies  Q2 ,  and  write  <  Q2 • 

If  we  define  two  questions  as  equivalent  if  <  Q2  and  Q2  <  ,  and  denote 

the  resulting  equivalence  class  by  [Q^],  then  the  implication  is  a  partial 
ordering  on  the  quotient  set.  The  equivalence  classes  of  questions  will  be 
called  propositions .  We  can  also  make  new  questions  from  old  ones,  by 
introducing  the  greatest  lower  bound  (GLB)  and  largest  upper  bound  (LUB)  of 
any  two  questions.  The  greatest  lower  bound  of  a  family  of  questions  is 
defined  as  the  question  /\^Q^  such  that  /N^Q^  true  means  that  in  the  event 
of  the  measurement  (verification)  of  an  arbitrary  one  of  the  Q^'s,  the 
result  "Yes"  is  certain.  The  least  upper  bound  \/fQ-[  of  the  family  of 
questions  is  then  the  greatest  lower  bound  of  all  questions  Sj  >  Vi. 

This  induces  a  greatest  lower  bound  and  a  least  upper  bound  on  the  set  of 
propositions.  There  exist  a  minimal  proposition  0  (always  false)  and  a 
maximal  one,  1,  which  is  always  true.  It  is  important  to  point  out  that  for 
the  stochastic  problem  exact  knowledge  of  the  truth  of  the  proposition  y^  - 
1,  does  not  allow  to  say  anything  with  absolute  certainty  of  the  output  yz- 
One  says  that  the  propositions  regarding  y^  are  incompatible  with  the 
propositions  regarding  y2 .  The  greatest  lower  bound  is  therefore  0,  i.e. 
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the  false  statement.  Propositions  regarding  the  output  at  one  particular 
time  are  compatible  (simultaneously  verifiable).  On  the  other  hand,  the 
propositional  system  for  a  deterministic  system  allows  propositions  of  the 
form  (yj-1,  y2~2) .  Furthermore,  if  Q  is  a  question,  then  the  question  that 
leads  to  the  reverse  truth  statements  is  supposed  to  exist  as  well,  and  is 
called  the  orthocomplement. 

At  a  more  abstract  level,  a  partially  ordered  set  t  is  a  complete 
lattice,  if  each  subset  of  L  admits  a  GLB  and  a  LUB,  which  belongs  to  L.  An 
orthocomplementation  in  a  lattice  is  a  map  — >  t  :  p  — >  p#  such  that 


i) 

<pV  -  P 

V  p  <=  t 

ii) 

pA  p#  -  0 

;  p  v  p#  -  I 

V  p  6  t 

(2) 

.  iii) 

p  <  q  — >  q#  <  p# 

Examples  of  a  complete  orthocomplemented  lattice  are  the  power  set  of  a 
given  set,  and  (more  generally)  any  o-algebra  B  as  for  instance  used  in  the 
Kolmogorovian  probability  theory.  Note  that  this  o-algebra  is  isomorpic  to 
the  set  of  propositions  that  can  be  made  about  the  probabilistic  events.  A 
lattice  is  weakly  modular  if  p  <  q  — >  qA(q#vp)-p.  If  p  <  q,  one 
says  that  q  covers  p  if  p  <  x  <  q  — >  x  -  p  or  x  -  q.  Elements  which 
cover  0  are  called  atoms.  A  lattice  is  atomic  if  Vp  *  0,  there  exists  an 
atom  a  <  p.  A  proposition  system  or  logic  is  then  abstractly  defined  as  a 
complete  orthocomplemented,  weakly  modular  and  atomic  lattice.  More  details 
can  be  found  in  the  literature  (e.g. [BV,P]) . 

In  the  above  example  of  a  o-algebra,  the  distributive  property 
S^a(S2VSj)  -  (S^aS2)v(S^aS2)  holds.  A  complete  orthocomplemented 
distributive  atomic  lattice  is  called  a  classical  (Boolean)  proposition 
system.  The  usual  Aristotelean  logic  reigns  in  such  a  system.  The 
propositional  system  (propositions  about  subsets  of  phase  space)  of  a 


subspaces  of  a  Hilbert  space? 


Let  H  be  a  Hilbert  space.  The  set  of  all  closed  subspaces  of  H  has 
the  structure  of  an  orthocomplemented  complete  lattice,  also  called  a  logic. 
A  one-to-one  correspondence  exists  between  the  lattice  of  all  closed 
subspaces  of  H  and  the  lattice  Proj  H  of  all  orthoprojectors  on  H.  Gleason 
[G]  has  shown  that  in  a  separable  Hilbert  space  of  dimension  at  least  three, 
every  measure  p  on  the  closed  subspaces  can  be  represented  by 

p(A)  -  Tr  (TP*)  (3) 

where  P*  is  the  projection  operator  on  the  subspace  A  of  H,  and  T  is  a 
positive  definite  operator  of  trace  class.  Geometrically,  these  measures 
arise  as  limits  of  convex  combinations  of  "elementary"  measures  of  the  form 

PV(A)  -  ||  P*(v)  ||2  ;  v  6  H  (4) 

This  notion  has  been  extended  by  Jajte  [J],  In  particular  it  has  been  shown 
that  every  (vector -valued)  Gleason  measure  £  on  Proj  H  can  be  extended  in  a 
unique  way  to  a  continuous  operator  on  L(H) ,  the  algebra  of  all  bounded 
linear  operators  in  H.  An  important  class  of  Gleason  measures  taking  values 
in  a  Hilbert  space  K  are  the  Orthogonally  Scattered  Measures  (OSG) . 

Definition:  An  OSG-measure  is  a  mapping  Proj  H  — >  K  for  which: 
i)  For  any  sequence  of  pairwise  orthogonal  projectors  Pj_,  P2.... 
from  Proj  H 


S  i?L  -  t  (  S  Pt  ) 


(5) 


the  series  on  the  left  hand  side  being  weakly  convergent, 
ii)  For  any  orthogonal  projectors  P,  Q  in  Proj  H 


3 


P  i  Q  — >  i?  1  iQ 


(6) 


in  an  interval  J,  then  nothing  changes  this  fact  in  the  future  (because  of 
causality) .  So  from  time  1  on,  the  statistical  subensemble  for  which  y^  €  J 
has  been  filtered  out.  The  axiomatic  foundations  of  quantum  theory  led  now 
to  a  suitable  representation  of  such  logics  An  arbitrary  propositional 
system  can  be  decomposed  as  the  direct  union  of  irreducible  propositional 
systems,  and  it  is  known  [Va]  that  every  irreducible  propositional  system 
can  be  realized  by  the  lattice  Proj (H)  of  all  closed  linear  subspaces  of  a 
Hilbert  space. 

Generalized  Probability  Measures. 

The  goal  of  this  section  is  to  calculate  the  probability  of  obtaining 
an  answer  "yes"  for  an  arbitrary  proposition  of  the  system,  "prepared  in  the 
state"  determined  from  the  available  data.  The  terminology  "prepared  in  a 
certain  state"  is  standard  for  physicists.  In  our  context,  it  simply  means 
that  we  look  at  an  ensemble  of  systems  whose  prior  information  set  is 
identical  to  the  observed  time  series;  i.e.  a  filtered  ensemble. 

If  the  underlying  propositional  system  is  a  Boolean  a-algebra,  then  the 
Kolmogorovlan  probability  theory  gives  a  consistent  definition  for  the 
probability  measures.  They  are  defined  on  a  measurable  space  (0,B).  We 
illustrated  that  since  the  probabilities  are  actually  defined  on  B,  it  may 
not  be  necessary  to  invoke  0  at  all.  B  is  the  consistent  collection  of 
(note  that  the  a- finiteness  axiom  cannot  be  empirically  deduced,  but  is 
brought  in  for  mathematical  convenience)  logical  statements  that  can  be  made 
about  the  physical  system.  It  is  possible  to  capture  the  classical 
probability  theory  as  a  theory  of  measures  on  Boolean  algebras.  Here  is  the 
problem:  We  want  to  make  consistent  probabilistic  statements,  but,  we  lost 
the  Boolean  structure  of  subsets  of  a  set.  How  do  we  generalize  a 

probability  so  it  can  be  defined  on  a  non-Boolean  logic,  e.g.  the  logic  of 
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classical  mechanical  system  has  this  property.  In  fact,  this  is  also  the 
logic  underlying  the  relations  among  the  propositions  of  the  probabilistic 
events  in  the  Kolmogorov  sense.  By  the  Loomis  representation  theorem,  there 
exists  then  a  set  fi  such  that  L  -  B(O),  i.e  the  propositional  system  is 
isomorpic  to  a  set  of  subsets  of  some  set. 

Whereas  the  Kolmogorovian  probability  starts  from  a  set  of  possible 
outcomes,  on  which  a  <7-algebra  of  events  is  described,  in  the  above 
alternate  way,  the  tr-algebra  (read:  propositional  system)  comes  first.  Now 
this  latter  viewpoint  leads  to  the  right  generalizations.  However  a  more 
general  logic  1  may  fail  to  be  distributive,  so  that  the  propositional 
system  can  no  longer  be  isomorhic  to  the  lattice  of  subsets  of  a  set.  Yet, 
even  in  such  a  case,  one  would  like  to  define  a  reasionable  notion  of 
"probability"  for  the  events  (propositions).  This  can  be  done  in  a 
consistent  way,  generalizing  classical  probability  [Gu] . 

It  is  already  clear  from  our  remarks  regarding  the  compatibility  that 
the  logic  of  a  stochastic  system  and  the  logic  of  a  deterministic  system  are 
quite  different.  In  the  first  the  distributive  property  fails  to  hold  in 
general.  To  illustrate  this,  consider  the  propositions  -  "  y^  >  0"  and 
S2  "  ”y2  >  0"  .  and  let  S*  be  its  complement.  Both  propositions  are 
meaningful,  the  composite  proposition  S2  v  S*,  which  is  the  trivial 
proposition  1  is  meaningful  as  well.  Then  A  (S2  v  S*)  is  simply  S^,  but 
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since  and  S2  are  incompatible,  (S^  A  S2)  v  (S^  a  Sp  has  no  meaning. 

The  state  of  a  system  is  represented  by  an  atom  (i.e.  a  proposition  that  is 
only  implied  by  the  always  false  proposition  0)  in  the  propositional  system. 
Conversely  if  we  are  given  all  the  propositions  true  for  the  system,  then 
the  state  is  defined  as  the  greatest  lower  bound  of  these  propositions. 
Conditioning  is  related  to  the  notion  of  a  filter.  This  means  that  if  for 
instance  at  time  1  the  output  is  observed,  and  found  with  absolute  certainty 


Any  OSG  defines  a  positive  Gleason  measure  via 


/iP  -  ||£P||2  ;  P  e  Proj  H  (7) 

By  Gleason's  theorem,  there  exists  then  a  non-negative  self-adjoint  trace 
class  operator  T  such  that 

£P  -  Tr  TP  ;  P  €  Proj  H  (8) 

The  above  can  be  interpreted  as  a  "variance".  A  "covariance"  can  be  defined 
by  COV(P,Q)  -  <£P,£Q>k  -  Tr  TPQ.  In  fact,  it  can  be  shown  that  if  H  and  K 
are  real  Hilbert  spaces,  then  V  P,Q  €  Proj  H 

<£P.£Q>k  “  Tr  TPQ  -  Tr  TQP  (9) 

where  T  is  given  by  Gleason's  theorem.  In  quantum  mechanics,  T  is  known  as 
the  density  matrix. 

Applications  to  Realization  Theory 

We  assume  now  that  we  have  N  sample  paths  of  length  p  of  a  stationary 
time  series,  and  organize  it  in  a  data  matrix  Y  e  rP*^.  This  entails  no 
contradiction  with  our  data- approach,  as  N  shifted  versions  of  the  same 
observed  data  can  be  used  by  virtue  of  the  (assumed)  stationarity .  In  the 
abstract  sense,  we  may  also  consider  the  exact  realization  problem.  In  this 
case,  we  shall  work  with  random  variables  rather  than  data  matrices.  The 
underlying  spaces  are  L^(fi,B,P)  for  the  stochastic  realization  problem,  and 
^pxN  £or  t^e  rea^  data  case.  These  spaces  are  isomorpic  respectively  with 
the  tensor  product  spaces 

l|<n,B,P)  -  RP  ®  L|(0,B,P)  (10) 


More  generally,  If  (V>^)  is  a  complete  orthonormal  set  in  H,  then  any  vector 
x  in  the  tensor  product  space  G  ®  H*  (a  space  of  operators  H  — >  G  )  of  the 
the  form 


x  -  X  |xt>  <4>t\  (12 

wnere  (x^)  is  a  family  of  "data" -vectors  €  G  ,  and  an  "evaluation" - 
vector  (e.g.  it  picks  out  the  u-th  sample).  The  bra-ket  notation  is  used 
since  it  seemed  to  be  the  most  clarifying  notation.  The  vector  x  will  be 
referred  ,*o  as  the  "conditioning" .  The  inner  product  in  G  ®  H*  is 
*  x  ,  y  >  -  <  Zi  |xi>  <0j  ,  Sj  J  y j  >  <V»j  J  ► 

-  S t j  <xi|yj>G 

-  <xi I y i>G  (13] 

Let  be  the  Gleason  measures  corresponding  to  x^,  i.e.  /^(A)  - 
||PA(x1)||  .  Introduce  now  a  superposition  of  measures  on  Proj  G  induced  by 


this  prior. 


"x  “  Si  n 


For  all  subspaces  A  of  G,  it  follows  that 


.(A)  -  Tr  Tvi^ 


where  Tx  -  |xj>  <x^|  -  xx'  is  interpreted  as  an  (unweighted)  gramian  or 
covariance  operator.  The  measure  A*X(A)  gives  a  numeric  value  to  the 
closeness  of  A  to  G,  given  the  prior  x.  In  a  quantum  mechanical  context,  it 
would  reAd  as  the  (unnormalized)  expectation  value  of  the  observable 
represented  by  the  projection  operator  PA,  when  the  system  is  prepared  in 
the  state  x. 


j".  ••  -• 
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The  problem  of  determining  the  subspace  of  fixed  dimension  which  "looks 
most  like  H  from  the  point  of  view  of  x"  is  then  solved  by  letting  PA  be  the 
projector  on  the  eigenspace  of  Tx  with  the  largest  principal  components. 
Aragon  and  Couot  [AC]  also  stated  several  equivalent  problems  relating  to 
the  PCA  (KLM) .  This  did  not  lead  to  a  useful  consistent  covariance  measure 
for  two  orthogonal  subspaces  of  G. 

The  Canonical  Correlation  Analysis,  and  an  alternative  derivation  of 
the  Principal  Component  Analysis  are  obtained  as  follows: 

For  each  B  e  Proj  G,  define  the  operators 

PB:  G  ®  H*  - >  G  ®  H*  (16) 

PBx  -  Z  PB 

-  Z  (PB|Xl»  <%|  (17) 

Note  that,  Vx  e  G  ®  H*. 

(PB)2(x)  -  PB  (Z  PB | xiX0i  | ) 

-  Z  (PB)2|xlxv>i| 

-  Z  PB|Xix^| 

-  PB(x)  (18) 

aB 

Thus  P  is  a  projection  operator.  Define  further 

£x:  Proj  G  - >  G  ®  H*  :  $X(B)  -  PB  (x)  (19) 

which  is  a  vector  valued  (in  G  ®  H  )  measure  (operator  valued  if  you  wish) 
satisfying: 

i)  For  any  set  (P^)  of  pairwise  orthogonal  subspaces 

Sl  {*<Pi>  -  Sij  PilxjX^I 

-  <X(S!  Pt)  (20) 

ii)  If  P  l  Q,  then  £X(P)  X  *X(Q)  (21) 

Hence,  £x  is  an  OSG*measure.  Then  by  (7),  there  exists  a  positive  measure 
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j*x  such  that 

Mx  “  lkx<P>llG  “  Si  lpl*i>l*  "  Tr  PTxP  (22) 

This  corresponds  to  a  "coherent"  addition  of  OSG  measures,  conditioned  on  x 
(i.e.  a  posterior  measure).  It  stems  from  the  fact  that  Tx  :  G  — >  G  is  a 
characteristic  for  the  given  x  in  G  ®  H  ,  (in  fact,  a  "sufficient" 
statistic"),  and  one  can  think  of  Tx  (or  jix)  as  being  conditioned  by  the 
x  e  G  ®  H  .  The  posterior  variance  operator  for  subspace  A,  and  covariance 
of  A  and  B  e  Proj  G,  given  x  is  then  the  operator  G  — >  G  given  by 

(PBx)(PAx)'  -  Z  PB|xiXxi|PA  -  PBTxP^  €  G  ®  G*  (23) 

i 

This  is  simply  interpreted  as  the  restriction  to  B  of  the  range  of  the  map 

Tx|^  restriction  of  Tx  to  A),  and  displays  the  coupling  of  the 

interface  between  A  and  B  given  x.  If  a  norm  || .  ||  Is  chosen  on  the  space  of 

•fa 

operators  G  ®  G  ,  a  scalar  covariance  measure  can  be  associated  to  this 
(co) variance  operator.  It  naturally  follows  that  the  correlation  between 
the  subspaces  A  and  B  in  Proj  G  is 

P(A,B|x)  -  ||PATxPB||  /  (||PATxPA||iPBTxPB||)4  (24) 

In  particular,  the  Frobenius  norm  leads  to  pp,  which  is  the  RV  coefficient 
(1)  for  the  empirical  data,  used  in  the  realization  context  in  [R,RV]  and 
motivated  in  [V],  other  norms  (e.g.  spectral)  may  be  taken  as  well.  In  the 
time  series  analysis,  let  the  observed  data  be  organized  in  a  data  matrix  X 
e  RP3^,  and  let  (^}  be  the  standard  orthonormal  basis  in  RP.  Then  Tx  -  XX1 
is  the  sample  covariance  matrix  S,  and  the  correlation  between  the 
complementary  subspaces  K  -  span(<^ ,  .  .  .  c^)  and  K1  -  span  (^k+1’  •  • 

(for  the  obvious  partition  of  S) 

PF  -  Tr(S12S21)/(TrSn2  TrS222  )4 


(2 


lI, 


The  PCA  (KLM)  and  CCA  are  now  both  optimization  problems,  but  with  respect 
to  different  constraints.  With  data  transformations  M  and  N  on  K  and  Kx 
respectively  (considered  as  past  and  future  in  the  Markovian  modeling 
problem) ,  corresponding  to  the  global  transformation  operator 

L-M^+N  (PK)X  :  G  — >  G  (26) 


these  problems  are  formulated  as  (0(K)  is  the  orthogonal  group  on  K) 


max  Tr  (MT12T21M' )/(Tr(MT11M' )2  Tr(T2?)2))4 
MeO(K) 


CCA  :  max  Tr  (MT, 2NN'T21M' )/(Tr(MT, ,M' )2  Tr(NT,,N' ) Z)h  (28) 

MeO(K) 

NeO(Kx) 

It  is  shown  in  [RV]  that  the  above  problems  are  equivalent  to  a 
generalized  singular  value  decomposition.  This  approach  is  the  "standard" 
one  of  [AK.DP],  It  also  relates  to  the  procrustes  problem  [GV,  p.  426];  i.e 
find  an  orthogonal  matrix  Q  such  that  the  Frobenius  norm  ||a-Qbj|  is  minimal 
for  given  A  and  B. 

Note  that  T  is  a  covariance  in  the  exact  realization  theory,  while  here 
it  corresponds  to  the  sample  covariance,  respectively  for  G  ®  H*  -  or 
rpn.  Discriminant  analysis  and  a  rational  way  for  discarding  variables  in 
multivariate  statistics  can  also  be  treated  in  this  way.  The  use  of  more 
general  probability  measures  in  pattern  recognition  has  already  been 
explored  by  Watanabe  [W] . 

Finally,  modifications  can  be  made  to  minimize  the  "endeffects"  due  to 
substitution  of  zeros  where  data  is  missing  in  the  time  series,  by  using  a 
weighted  linear  superposition  of  states. 


Conclusions  and  Outlook 

The  RV-coefficient ,  used  successfully  in  the  unification  of  various 
•  stochastic  realization  approaches,  has  been  linked  to  the  generalized 

measures  defined  on  the  subspaces  of  a  Hilbert  space.  The  logics  defined  on 
these  spaces  are  similar  to  the  representations  of  the  logics  that  occurring 
in  quantum  mechanics  and  allow  for  the  definition  of  generalized  measures. 
More  importantly,  we  have  indicated  that  a  similar  approach  can  be  used  in 
the  realization  problems.  Ve  have  only  discussed  some  details  for  the 
stochastic  realization,  which  has  the  same  structure  as  a  purely  quantal 
system  in  physics.  The  application  of  this  logico-algebraic  approach  in  the 
deterministic  realization  theory  is  pursued  elsewhere.  We  only  mention  that 
in  this' case  a  usual  Boolean  (or  classical)  logic  results,  and  a  theory  of 
approximation  and  modeling  can  be  based  on  set  theoretic  measures .  The 
quantum  "probability"  is  different  from  the  Kolmogorov  probability.  The 
"logic"  is  not  the  one  of  subsets  (Boolean  lattice),  but  the  logic  of 
subspaces.  The  first  is  a  special  case:  the  set  of  orthogonal  subspaces 
spanned  by  subsets  of  a  complete  orthonormal  basis  form  a  distributive 
logic.  How  can  we  further  reconcile  the  analogy  with  quantum  mechanics? 
Quantum  experiments  are  non-repeatable,  i.e.  Identical  realizations  are  not 
possible,  whereas  for  identically  prepared  classical  systems,  identical 
propositions  follow.  Think  of  the  analogy  with  an  ensemble  of 
noninteracting,  identically  prepared  deterministic  systems.  By  definition, 
each  of  these  gives  necessarily  the  same  response,  whereas  this  will  not  be 
the  case  of  a  stochastic  ensemble,  because  of  the  different  inaccessible 
noises  in  each  realization.  Instead  of  working  with  an  ensemble  of 
realizations  in  parallel,  ve  can  work  with  a  single  realization,  but 
serially  in  time,  if  ve  assume  timeinvariant  systems  and  stationary  noises. 
Roughly  speaking  ve  then  have  the  folloving  connections: 
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Gleason  Measures 


Kolmogorov  measures 

Classical  Mechanics  .  Quantum  Mechanics 

Deterministic  Realizations  . .  Stochastic  Realizations 
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ABSTRACT 


The  multivariate  time  series  identification  problem  is  approached  in  this 
paper  from  a  canonical  variate  analysis  point  of  view.  Tvo  different  but 
related  problems  are  extensively  studied,  namely  the  generalized  symmetric  and 
generalized  unsymmetric  stochastic  realization  problems.  These  are  associated 
with  the  problem  of  finding  linear  transformations  (basis  vectors)  of  the 
forward  and/or  backward  predictor  spaces  of  a  second-order  stationary  vector 
stochastic  process.  These  basis  vectors  correspond  to  the  states  of  a  forward 
and/or  backward  Kalman  filter  models,  respectively.  A  new  form  of  solution  is 
presented  which  provides  a  unified  framework  for  solving  these  two  related 
problems  and,  in  addition,  motivates  algorithmic  development.  This  unified 
framework,  known  as  the  RV-coef f icient  approach,  is  used  to  generalize 
previously  known  results  in  stochastic  realization  theory  and  to  generate 
several  new  others.  In  particular,  it  is  shown  that  the  canonical  realization 
algorithm  and  the  Karhunen  -  Loeve  method,  which  solve  the  symmetric  and 
unsymmetric  stochastic  realization  problems,  respectively,  can  be  derived  under 
this  unified  framework.  More  importantly,  the  RV-coef f icient  provides  a  common 
statistical  measure  of  information  that  can  be  used  as  a  tool  for  comparing 
performance  between  algorithms  and  for  obtaining  appropriate  reduced  -  order 
models.  The  normalized  balanced  realizations  found  in  deterministic  realization 
theory  are  extended  to  the  stochastic  case  and  shown  to  have  some  optimality 
properties  in  the  RV-coef ficient  sense.  Also,  the  problem  of  transforming  a 
given  pair  of  forward-backward  innovations  representations  to  a  certain 
canonical  form  (coordinate-free)  is  shown  to  be,  in  the  RV-coef ficient  sense,  of 
the  same  format  as  that  of  the  stochastic  realization  problem. 
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I .  INTRODUCTION 


The  time  invariant  stochastic  realization  problem  can  be  defined  as  the 
problem  of  finding  a  finite  dimensional  Markovian  representation  (state-space 
model)  from  knowledge  of  the  autocovariance  sequence  of  a  second-order 
stationary  stochastic  process.  This  problem  has  received  a  great  deal  of 
attention  in  the  recent  past  due  to  its  fundamental  importance  in  system 
identification,  digital  filtering,  signal  processing,  and  time  series  modeling 
[1]  -  [11].  In  many  applications,  the  Markovian  representation  or  state-space 
model  may  be  computationally  unmanageable  due  to  its  high  dimension.  This  may 
be  caused  by  the  introduction  of  superfluous  state  components  due  to  noise 
perturbations  in  the  covariance  structure  of  the  stochastic  process  as  a  result 
of  roundoff  errors,  inexact  covariance  estimates,  etc..  The  solution  then  calls 
for  an  approximate  or  reduced-order  model,  vhich  ray  be  obtained  directly  from 
the  solution  to  the  stochastic  realization  problem,  provided  it  yields  a 
coordinate-free  representation  [7]  -  [11]. 

Faurre  [4]  has  clarified  the  algorithmic  aspects  of  the  stochastic 
realization  problem  by  characterizing  the  set  of  all  possible  Markovian 
representations  in  terms  of  extreme  point  or  canonical  representations.  This 
canonical  structure  was  further  extended  by  Akaike  [3]  [12],  [13],  who  has 
developed  a  stochastic  realization  theory  based  on  the  information  interface 
between  the  past  and  future  of  a  stochastic  process  and  the  concepts  of 
predictor  spaces  and  canonical  variables.  This  theory  has  been  fundamental  to 
modern  stochastic  realization  algorithms,  vhich,  in  an  optimal  way,  attempt  to 
approximate  the  information  interface  between  the  past  and  future  of  the 
stochastic  process. 


To  fix  the  ideas,  let  us  mathematically  define  the  stochastic  realization 


problem  as  follows.  Given  a  zero  mean,  rational,  discrete-time,  stationary, 
vector  stochastic  process  {y^} ,  find  a  Markovian  represenation  of  the  form 

xk+l  "  Fxk  +  vk  (la> 

yk  -  Hxk  +  vk  (lb) 

such  that  it  has  minimal  dimension  (n)  and  the  output  (lb)  generates  the  same 
autocovariance  function  as  that  of  the  process  {ykJ .  Here  xk  is  the  (nxl)  state 
vector  process,  vk  and  vk  are  respectively  (nxl)  and  (mxl)  zero  mean  white 
Gaussian  noise  processes,  yk  is  the  (mxl)  output  vector,  and  the  parameter 
matrices  F  and  H  are  of  appropriate  dimensions.  Furthermore,  the  noise 
processes  wk  and  vk  have  the  following  joint  covariance  structure: 

-  Ewv  «ks  (2> 

T 

E  is  the  expectation  operator,  A 
S,  and  R  are  constant  matrices  of 
E^M),  where  A>0  (>0)  refers  to  a 

semi-definite).  In  addition,  (1) 


-  E{xkvsT)  -0»  s>k 


where  $ks  is  the  Kronecker  delta  function, 
denotes  the  matrix  transpose  of  A,  and  Q, 
appropriate  dimensions  such  that  Q>0,  R>0,  and 
matrix  A  which  is  positive  definite  (positive 
is  Markov  with  forward  propagation  property 


E{x.weT} 


(3) 


and  output  autocovariance  function  given  by 


A(k)  -  [HFlt~1G]l(k)  +  [GT(F-k_1)T  HT]l(  k)  +  [ A(0)  ]  ^ 


vhere 


1  if  k>0 


0  otherwise 


G  =  FEHX  +  S 


A(0)  -  HEH  +  R 


E  -  FEF  +  Q 


and  E  -  }  is  the  (nxn)  positive  definite  state  covariance  matrix. 

The  stochastic  realization  problem  then  amounts  to  identifying  a  triple 
(F,G,H)n  of  minimal  dimension  n  and  covariance  matrices  (E,Q,R,S)  such  that  (2) 
-  (4)  are  satisfied.  The  motivation  for  solution  is  due  to  Akaike  [3],  [12], 

[13]  and  follows. 

Let  the  past  and  future  of  {y^}  be  defined,  respectively,  by 


-V-~. -■ 


and  let  the  block  Hankel 

matrix,  H, 

along  with  the 

respective 

covariance  matrices,  R  and 

R+,  be  defined  as 

[8] 
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Nov  define,  respectively,  the  forward  and  backward  predictor  spaces  of  {y 


future 


(6) 


(7) 


(8) 


k}  as 


\  -  span  [Yk+|\~]  -  span[ J 

-  span  [T^jT*]  -  span[H(R+)"1Ik'''] 


(9b) 


T  T  X 

where,  for  zero  mean  random  vectors  a  and  b,  [ajb]  -  E{ab  )E(bb  )~d  denotes  the 
orthogonal  projection  of  a  onto  the  Hilbert  space  of  random  variables  spanned  by 
the  components  of  b. 

The  predictor  spaces  (9)  have  infinite  components,  however,  Akaike  [31, 
[13]  has  shown  that  the  states  of  a  forward  and  backward  Harkovian 
representation  are  finite  dimensional  basis  vectors  of  the  predictor  spaces. 
Thus,  finding  a  basis  respectively  for  (9a)  and  (9b)  is  a  matter  of  finding 
linear  transformation  matrices  A  and  B  such  that  the  transformed  vectors  (basis 
vectors  of  dimension  n),  called  the  state  vectors,  and  z^,  are  orthogonal, 
i  .e. , 

xk  -  ATH(R“)_1Yk"  (10a) 

zkl  -  BTHT(R+)-1Yk+  (10b) 

with  orthogonality  property  defined  by 

E(xkxk  }  "  Ax  "  dlag  I‘xl’  5x2’  '  *  *»  6xnJ  (lla) 

E{zk-lZk-lTj  "  Az  "  diag  Ifizl’  Sz2’  *  *  *»  5zn^  (llb) 

Notice  that  zk  is  the  state  of  a  backward  Markov  model  which  is  dual  to  (1)  and 

T 

evolves  in  the  opposite  direction  of  time  (see  [14]  -  [16]).  If  we  let  M  * 
A  H(R  )  and  L  *  B  H  (R  )  ,  then  the  covariance  based  stochastic  realization 

problem  reduces  to  that  of  finding  linear  transformation  matrices  L  and/or  M 
such  that  (10a)  and  (11a)  and/or  (10b)  and  (lib)  are  satisfied. 


Fortunately,  the  solution  to  this  problem  is  a  classical  one  in 
multivariate  analysis  and  has  led  the  vay  to  recent  stochastic  realization 
algorithms.  These  algorithms  have  the  added  feature  of  implicitly  solving  the 
Riccati  equations  arising  in  an  earlier  algorithm  due  to  Faurre  [4],  [57]. 

With  this  canonical  structure,  Akaike  [13]  extended  the  Ho-Kalman  algorithm 
[17]  to  the  stationary  stochastic  case,  and  Baram  [11]  accounted  for  the 
nonstationary  stochastic  processes  generalization.  Along  the  same  lines,  Desai 
and  Pal  [8]  introduced  the  canonical  realization  algorithm  (CRA)  for  balanced 
stochastic  realizations  (defined  in  the  sequel),  while  Arun  and  Kung  [9] 
developed  the  Karhunen-Loeve  Method  (KLM)  for  solving  a  one-sided  problem  (i.e., 
x^  or  z^^)-  In  an  attempt  to  unify  existing  stochastic  realization  algorithms, 
Larimore  [7]  developed  the  generalized  canonical  variate  method,  which 
successfully  breaks  into  CRA  or  KLM  as  specific  cases.  Other  forms  or  variants 
of  these  algorithms  have  been  given  in  [18]  -  [20].  These  algorithms  all  have 
their  grounds  on  some  form  of  multivariate  statistical  analysis. 

Earlier  attempts  to  unify  existing  stochastic  realization  algorithms  have 
partly  failed  due  to  a  lack  of  a  common  statistical  measure  of  Information  that 
can  be  used  as  a  rationale  for  drawing  inferences  about  the  performance  of  the 
algorithms  or  for  model  reduction  purposes.  Meanwhile,  existing  measures  may 
lead  to  results  which  differ  in  magnitude  depending  on  the  type  of  solution,  and 
therefore,  can  create  a  problem  of  interpretation. 

Escoufier  [21]  and  Robert  and  Escoufier  [31]  introduced  the  RV-coef f icient 
statistic  as  a  tool  for  solving  a  large  class  of  problems  arising  in 
multivariate  statistical  analysis.  This  solution  approach,  although  it  has  not 
received  much  attention  in  the  literature,  shows  future  promise  in  stochastic 
realization  theory  as  well  as  signal  processing,  pattern  recognition,  and 
discriminant  analysis,  to  name  only  a  few  [10],  [22],  [23].  In  this  paper  we 
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solve  the  covariance  based  stochastic  realization  problem  from  an  RV-coef f icient 
point  of  view  and  shov  that  the  solutions  leading  to  different  algorithms  can 
all  be  put  under  this  common  framework  of  analysis. 

Ramos  [10]  and  Ramos  and  Verriet  [30]  shoved  that  the  RV-coef f icient 
approach  leads  naturally  to  Akaike's  stochastic  realization  theory  [13].  In 
addition,  it  serves  as  a  tool  for  algorithmic  development  and  introduces  a 
common  statistical  measure  of  information  in  standardized  units  (i.e.,  in  the 
interval  [0,1]),  which  can  aid  signficantly  in  the  interpretation  of  the 
results.  This  allows  the  modeler  to  compare  several  algorithms  and  models  as 
well  as  model  reduction  techniques.  Recently  Verriest  [22],  [23]  extended 
Escoufier's  RV-coef f icient  to  a  geometrical  framework  which,  based  on  certain 
operator  valued  measures,  defines  the  correlation  between  subspaces  of  a  Hilbert 
space.  The  approach  is  motivated  by  the  procrustean  problem  [32]  and  leads  to 
several  different  RV-type  measures.  Other  measures  of  multivariate  association 
between  two  sets  of  random  variables  are  suggested  in  [33]  -  [35]. 

Ue  begin  the  next  section  by  reviewing  the  statistical  measures  used  in 
existing  stochastic  realization  algorithms  and  discuss  their  limitations.  In 
Section  3,  we  solve  the  general,  symmetric  stochastic  realization  problem  and 
its  unsymmetric  version,  both  from  an  RV-coef f icient  point  of  view,  and  further 
relate  the  results  to  the  works  of  Desai  and  Pal  [8]  and  Arun  and  Rung  [9], 
respectively.  Ve  further  introduce  several  other  theoretical  extensions  and 
present  a  unification  of  existing  stochastic  realization  algorithms.  Section  4 
treats  the  problem  of  transforming  a  given  forward-backward  pair  of  innovations 
representations  into  the  canonical  structure  of  the  symmetric  or  unsymmetric 
stochastic  realization  problems,  by  taking  a  direct  route  via  the  RV-coef f icient 
approach.  Discussions  of  the  present  work  and  conclusions  are  contained  in 
Section  5. 
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II.  COMPARISON  OF  EXISTING  PERFORMANCE  MEASURES  AND 
MOTIVATION  FOR  UNIFICATION 

Since  Hotelling's  classic  paper  [24],  the  theory  of  canonical  variate 
analysis  has  been  fortified  with  several  measures  for  quantitatively  assessing 
the  degree  of  association  betveen  tvo  sets  of  random  vectors.  These  measures 
have  been  categorized  by  Cramer  and  Nicevander  [25]  into  tvo  distinctive 
classes:  redundancy  measures  and  canonical  correlation  measures.  The  former 
type  are  associated  vith  the  predictablility  of  one  component  with  respect  to 
the  other,  and  thus  form  the  basis  for  the  methods  of  principal  components  of 
instrumental  variables  [26],  external  single  set  components  analysis  [27],  and 
redundancy  analysis  [28],  all  of  which  seem  to  be  equivalent.  Canonical 
correlation  measures  on  the  other  hand,  are  measures  of  multivariate  association 
that  attempt  to  extend  the  correlation  coefficient  to  tvo  sets  of  random 
vectors.  These  measures  form  the  basis  for  the  methods  of  canonical  correlation 
analysis,  multivariate  analysis  of  variance,  and  multivariate  regression  [58]. 
Canonical  correlation  based  measures  are  symmetric  as  opposed  to  redundancy 
measures  which  are  asymmetric. 

Recently,  Larimore  [7]  introduced  a  generalized  measure  of  multivariate 
association  which,  depending  on  the  type  of  problem,  acts  either  as  a  redundancy 
measure  or  a  canonical  correlation  measure.  For  this  reason.  ve  classify  it  as 
a  combined  performance  measure. 

Several  performance  measures  have  been  developed  for  use  in  stochastic 
realization  theory,  especially  vith  the  model  approximation  problem.  These  can 
also  be  classified  under  the  above  -  mentioned  categories,  however,  before  ve 
attempt  to  do  so,  ve  first  need  to  look  at  the  symmetry  aspects  of  the 
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stochastic  realization  problem  and  the  underlying  form  of  the  particular  type  of 
solution. 


Recall  that  Y^-  and  Yk+  are  respectively  the  past  and  future  of  the 
stochastic  process  {y^} ,  whose  joint  covariance  matrix  is  given  by 
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Further  recall  that  the  stochastic  realization  problem  is  associated  with 

the  problem  of  finding  transformation  vectors  (also  canonical  or  state  vectors) 
T  T  + 

xk  -  M  Y^  and/or  zk_^  -  L  Y^  such  that  these  are  orthogonal.  In  general,  the 
simultaneous  estimation  of  the  transformation  matrices  M  and  L  will  lead  to 
canonical  correlation  based  measures.  Here  the  joint  covariance  matrix  of  the 
state  vectors  is  given  by 
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where  T  -  diag  y . ..,  rn]  is  an  (nxn)  diagonal  matrix  of  squared 
canonical  correlation  coefficients  between  Y^-  and  Yk+,  n  -  rank  [H]  corresponds 
to  the  dimension  of  the  state  vector,  while  the  other  terms  are  as  defined 
previously.  Condition  (13)  is  known  in  multivariate  •  analysis  as  the 
bi-orthogonality  condition.  In  the  stochastic  realization  problem  this  is 
equivalent  to  the  problem  having  a  symmetric  solution,  i.e.,  the  state-space 
models  characterized  by  z,  .  and  x,  are  dual  to  each  other. 
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On  the  contrary,  if  one  is  interested  in  estimating  M  and  L  independently, 
i.e.,  as  two  separate  problems,  then  the  joint  problem  becomes  unsymmetric  and, 
in  general,  vill  lead  to  a  pair  of  independent  redundancy  based  measures.  In 
this  case,  ve  have  the  following  joint  state  covariance  structure 
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(14) 


As  expected  (14)  lacks  bi-orthogonality  since  the  off-diagonal  matrices  are 
not  diagnonal,  as  a  result  of  the  independent  orthogonalizations  of  z^ and  x^. 
As  we  will  see  later  this  lack  of  symmetry  will  lead  to  state  vectors  z^ and 
xk  that  are  not  dual  to  each  other  as  opposed  to  those  obtained  satisfying  (13). 

We  now  continue  our  specific  discussion  with  a  brief  description  of  the 
different  types  of  performance  measures  used  in  conjunction  with  the  stochastic 
realization  problem. 


A.  Canonical  Correlation  Measures 

These  measures  are  not,  in  general,  associated  with  prediction,  but 
rather  with  maximum  correlation  or  similarity  between  two  sets  of  random 
vectors.  However,  Yohai  and  Garcia  Ben  [36]  have  shown  that  solving 
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min  p(Yk+,  MTYk“)  -  |  E(Yk+  -  Tk+)(Yk+  -  \*) 
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I  R+  I  H  (l-r4)  (15) 
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subject  to  the  orthogonality  constraint  M  (R~)H  -  A  ,  leads  to  a  measure  of 

+  T 

prediction  accuracy  (or  confidence)  of  Yk  based  on  •  H  ,  i.e., 


fk+  -  ElTk+|xk-HTYk']  -  HM[MT(R_)M]“1MTIk'  (16) 

+  a  .  t 

is  the  best  linear  predictor  of  Tk  among  all  predictors  of  the  form  Yk  -  D  xk, 

where  D  is  any  (•  x  n)  matrix.  Here  we  have  taken  Ax  -  In  for  convenience. 

It  should  be  noted,  however  that  (15)  involves  solving  the  following 

generalized  eigenvalue  -  eigenvector  problem 


T  T  +  -1 

MAB  (R  )  HM  -  r 
MT(R~)M  -  a  -  I 


(17a) 

(17b) 


However,  if  we  minimize  the  anti-causal  function  p(LTYk+,Yk“)  with  Az  -  I  ,  then 
the  solution  involves  a  generalized  eigenvalue-eigenvector  problem  dual  to  (17) 
whose  solution  is  given  by 


min  p(L*Y.+,Y  “)  -  |R"|  n  (1-r.) 
L  *  *  i-1  1 


One  can  see  that  (15)  and  (18)  are  related  to  one  another  by  the  factor, 
n 

w  -  n  (l-y. ),  which  is  called  the  alienation  coefficient  [24].  This 
i-1 

coefficient  which  carries  information  from  both  the  forward  and  backward  models, 
is  a  measure  of  independence  between  Yk”  and  Yk+.  Therefore,  (15)  and  (18) 
being  a  constant  multiple  of  w,  also  reflect  the  independence  between  Yk~  and 
Yk+.  Hence,  there  is  practically  no  gain  in  information  by  solving  (15)  and/or 
(18)  separately  as  in  [36].  It  has  been  shown  in  [37]  that  since  (15  and  (18) 


maximize  a  determinant  as  opposed  to  a  trace,  they  cannot  be  considered 
prediction  measures.  Hence,  ve  classify  them  as  canonical  correlation  measures. 
Desai  and  Pal  [8]  solved  both  problems  simultaneously  by  performing  a  singular 
value  decomposition  of  a  weighted  Hankel  matrix,  arriving  at  the  following 
information  based  measure  due  to  Gelfand  and  Yaglom  [38]. 
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Notice  that  (19)  is  of  the  form,  a  log  w,  where  a  is  a  constant,  therefore,  it 
also  reflects  the  independence  between  Y^-  and  Y^+  [7].  Furthermore,  when  a 
canonical  correlation  coefficient  is  unity  (15)  and  (17)  are  equal  to  zero, 
while  (19)  degenerates  to  infinity. 

Another  measure  which  has  been  in  use  for  some  time  is  132] 


infimum  | | H  -  A | |p 
A: rank  [A]«n 


Yn+1  +  Yn+2 


T  1/2 

where  ||  •  | |p  denotes  the  Frobenius  norm  (i.e. ,  | |  A  | | p  »  [ tr  (A  A) ]  ).  The 

matrix  A  represents  the  best  nth  order  approximation  to  the  Hankel  matrix  H.  If 

2 

one  divides  (20)  by  ||  H  ||_  ,  then  we  obtain 


(23) 
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where  Sx^  is  the  variance  of  the  ith  state  component  x^*  and  corresponds  to  the 

T  -  -ITT 

ith  largest  eigenvalue  of  HM[M  (R  )M]  M  H  .  The  measure  (23)  then  represent! 
the  proportion  of  variance  unaccounted  for  by  the  state-space  model  with  x^  as 
the  n-dimensional  state  vector.  Similarly,  for  estimating  z^  ^  -  L  Y^  ,  one  can 
obtain 

n 

♦  (LTYk+,  Yk")  -  1  -  ^  6z.  (24) 

i-1 _ 

P 

where  p  -  tr[R~]  and  5 z^  is  the  ith  largest  eigenvalue  of  HTL[LT(R+)L]-1LTH.  In 

.  T  -  T  + 

general,  when  yfc  is  not  a  scalar  or  when  A(k)  *  A(-k),  <KYk  ,  M  Y^  )  *  4>(L  Y^  , 
Yk~)  due  to  lack  of  symmetry  in  the  individual  solutions.  In  this  case,  the 
eigenvalues  in  (23)  and  (24)  are  different  as  opposed  to  those  in  (15)  and  (18) 
which  are  the  common  squared  canonical  correlation  coefficients. 

Notice  that  (fix^/q)  in  (23)  represents  the  explained  fractional  variance  of 
Y^+  by  the  ith  state  component  x^*.  The  same  holds  true  for  (24)  with  respect 
to  the  variance  of  Y^-  in  terms  of  z^*.  Thus,  (23)  and  (24)  are  quantitative 
measures  that  account  for  the  distribution  of  the  variance  of  Yk+  and  Y^-  by 
their  respective  canonical  components  (state  vectors).  Unfortunately,  this  is 
not  the  case  for  canonical  correlation  based  measures.  In  [10],  [30]  a 
redundancy  index  was  derived  from  the  canonical  correlation  solution  in  order  to 
determine  the  individual  contribution  of  the  state  components  to  the  total 
variance  of  Yk+  and  Y^-. 
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which  represents  the  degree  of  deterioration  in  the  approximation.  We  have  used 
here  as  an  upper  bound  for  H,  R~  and  R+,  however,  in  practice  one  should  use 
a  finite  upper  bound. 


B.  Redundancy  Based  Heasures 


As  mentioned  earlier,  redundancy  measures  are  associated  with  the 

prediction  efficiency  of  one  of  the  components  (Yk~  or  Yk+)  with  respect  to  the 

T 

other.  Arun  and  Rung  [9]  (see  also  [39]  -  [41])  computed  xk  -  M  Yk  by 

performing  a  one-sided  Karhunen-Loeve  expansion  of  the  forward  predictor 
space  *  span[Yk+| Yk~] .  The  criterion  used  was  based  on  minimizing  the 

following  prediction  efficiency  measure 


Y(Yk+,  MTYk")  -  tr[R+]  -  tr[HM[MT(R")M]-1MTBT]  (22) 

Por  convenience  of  interpretation,  we  divide  (2/)  by  q  -  tr[R+],  the  total 
variance  in  Yk+,  to  obtain 
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c. 


Combined  Measures 


The  only  measure  found  in  this  category  is  due  to  Larimore  [27]  and  in  our 
notation  is  defined  as 


*V-  MV>e  -  E'<V  -  V>T  9-1  <V 


x  A  x 

<V  -  V >1 


a  ,  +  a  „  +  .  . 
n+1  n+2 


where  a^'s  are  the  singular  values  of  a  well-defined  matrix.  When  0-1^,  (25)  is 
a  redundancy  measure  and  when  6  -  R+,  it  becomes  a  canonical  correlation  measure 
with  a^'s  as  the  squared  canonical  correlation  coefficients. 

One  can  see  that  depending  on  the  form  of  6,  the  magntiude  of  (25)  can 
change  drastically,  therefore,  presenting  a  problem  of  interpretation  if  one 
wants  to  compare  both  modeling  approaches.  Similarly,  all  other  performance 
measures  are  derived  based  on  one  type  of  problem  in  mind  (i.e.,  the  symmetric 
or  unsymmetric  stochastic  realization  problem),  and  have  no  equivalent  measure 
for  the  converse  problem.  Therefore,  if  we  want  to  solve  both  types  of 
stochastic  realization  problems  by  the  different  methods  available,  then  we 
cannot  compare  the  results  simply  because  there  is  no  common  measure  that  yields 
the  same  units  (i.e.,  variance,  correlation,  information,  similarity,  etc.).  We 
are  then  faced  with  a  problem  of  interpretation. 

To  overcome  this  difficulty,  we  will  next  present  a  unified  measure  of 
similarity,  known  as  the  RV-coefficient,  which  will  allow  us  to  unify  previous 
algorithms  under  a  common  framework  of  analysis. 
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III.  SOLUTION  TO  THE  COVARIANCE  BASED  STOCHASTIC  REALIZATION  PROBLEM: 

THE  RV-COEFFICIENT  APPROACH 

A.  Mathematical  Preliminaries 

Consider  a  particular  sample  realization  of  tvo  zero  mean  stationary  vector 
stochastic  processes  (x^j  and  {y^}  of  dimension  (pxl),  whose  squared  Euclidean 
distance  is  given  by 


D  <xk,  yk) 


llxk  ~  yJI  2  "  E{(Vyk)T  (xk“yk)}  Vk 


E(xkTxk)  -  2E{xkTyk)  ♦  EfykTyk) 


2  2 
s  +  2s  +  s 
x  xy  y 


Replacing  xk  and  yk  by  their  respective  normalized  vectors,  xk  and  yk,  it 


follows  immediately  that 


D  <  W  -  2(1-rxy> 


where  rxy  is  the  correlation  coefficient  between  xk  and  yk> 

Now,  suppose  we  collect  all  the  information  available  in  xk  and  yk  for 
ke[l,N]  and  form  the  (pxN)  data  matrices  X  and  Y  (X  and  Y  may  have  different  row 
dimensions).  Then  X  induces  a  configuration  C(X)  of  N  points  in  Rp  with 
relative  distance  matrix  given  by  [31] 


D*(X) 


- 

[ tr[S(X;  ] j  ^ 
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where  S(X)  ■  X  X.  A  similar  expression  for  D  (Y)  may  be  obtained  from  Y.  These 
distance  matrices  are  translation  and  rotation  invariant  and  also  invariant  to 
global  changes  of  scale.  Then  by  making  use  of  the  scalar  product  (tr(A  B)]  for 
square  matrices  A  and  B,  and  its  induced  norm,  |  |A  |  |  -  [tr(A  A)]  (notice  that 
| | D  (x)  |  |  ■  |  |D  (Y) | |  =  1),  the  distance  between  the  two  configurations  C(X)  and 
C(Y)  can  be  defined  in  a  form  similar  to  the  relative  distance  between  two 
vectors  as 


D2[C(X),  C(Y) ]  -  |  |D*(X)  -  D*(Y) | \4 


-  2 


1  - 


tr[S(X)S(Y) ] 

- - - 3 — T/5 

[tr[S(X)2]trIS(Y)2]]A/z 


2  [1  -  RV(X, Y) ] 


(29) 


where, 


RV(X, Y) 


tr[S(X)S(Y) ] 


ITTI72 


[ tr[S(X)  ]  tr[S(Y)  ] ] 


tr[S  S  ] 

1  xy  yxJ 

- , - - - - 


Here, 


T 

xx* ,  s. 


-  YY" ,  S 


XYa,  and  S 


(30) 


YX  ,  are  within  a 


xx  ’  ~yy  '  xy  "  '  ~yx 

multiplicative  factor  of  1/N,  the  covariances  and  cross-covariances  between  X 

and  I.  Immediately  ve  can  see  that  (29)  is  a  generalization  of  the  distance 

between  two  vectors  and  RV(X,Y)  is  a  measure  for  the  correlation  between  the 

configurations  C(X)  and  C(Y). 
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The  sample  RV-coef ficient  shares  some  of  the  properties  of  a  squared 
correlation  coefficient,  i.e.,  has  values  in  [0,1].  The  closer  to  1  it  is,  the 
closer  are  the  patterns  and  the  better  is  Y  (or  X)  as  a  substitute  for  X(Y)  in 
characterizing  the  sample  space.  Summarized  belov  are  other  properties  of  the 
RV-coef ficient  [31],  [42],  [43]: 

i.  D[C(X) ,  C(Y) ]  is  a  decreasing  function  of  RV(X,Y). 

ii.  RV(X, Y)  =  RV(Y,X) 

iii.  RV(ATX,Y)  =  RV(X,Y)  when  A  is  orthogonal 

iv.  RV(X, Y)  -  0  if  and  only  if  XYT-0 

v.  RV(X,Y)  =  1  if  and  only  if  X=kY  for  some  nonzero  scalar  k. 

vi.  RV(aX,-Y)  =  RV(X,Y)  for  some  nonzero  scalar  a. 

The  RV-coef ficient  evolved  thus  from  a  geometrical  interpretation  as  a 
measure  for  the  comparison  of  subspaces.  This  bears  some  similarity  to  the 
relationship  between  subspaces  of  a  given  space,  formalized  using  the  singular 
value  decomposition  (SVD). 

In  particular,  the  orthogonal  Procrustes  problem  [32]  analyses  the 
possibility  of  rotating  a  given  data  matrix  X  into  another  given  data  matrix  Y 
(both  of  size  pxN).  The  precise  problem  is 


minimize  |  j X  -  YQ  |  |. 
Q 


(31) 


subject  to  (TQ  -  I 


N 


T„T, 


vhich  is  equivalent  to  maximizing  tr[XQ  Y  ]  subject  to  the  orthogon 


constraint  on  Q.  The  solution,  in  terms  of  the  SVD  of  Y  a,  i.e., 
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T  T 

Ya  -  UEv 

T 

is  obtained  for  Q  «  UV  ,  yielding 
maximum  tr(XQ^Y^)  -  tr l E] 


(32) 


(33) 


The  minimization  of  the  Frobenius  norm  in  the  above  problem  stems  from  the 
fact  that  if  the  columns  of  the  data  matrices  X  and  Y  are  displayed  in  R^,  the  a 
natural  measure  of  the  "goodness  of  fit"  between  the  N  point  clusters  in  R**  is 

N 

D(X,Y).  ^  1 1\  ~  \W2  m  tr[(X-Y)(X-Y)T] 

k  .  1 

-  I lX-Y| ]2p  (34) 

If  the  problem  is  generalized  by  considering  more  general  transformations 
T,  consisting  of  translations,  rotations,  and  (uniform)  scalings,  then  it  is 


Ve  will  next  solve  the  covariance  based  stochastic  realization  problem  from 

an  RV-coef f icient  point  of  view.  Ve  start  from  given  matrices,  X  and  Y,  and 

then  search  for  linear  transformation  matrices,  L  and/or  M,  in  such  a  vay  that 

the  "images"  of  the  transformed  variables  are  as  close  as  possible.  The  reason 

for  this  stems  from  Akaike's  stochastic  realization  theory  [3],  [12],  [13], 

where  the  transformed  variables  form  a  minimal  information  interface,  that  is 

L  X  should  contain  all  the  information  about  Y  that  is  contained  in  X  and  vice 

versa.  Also,  if  ve  were  to  reproduce  X  (or  Y)  from  the  transformed  variables 
T  T 

Y  Y  (LX),  then  ve  would  like  to  loose  as  little  information  as  possible  in  the 
approximation. 

B.  The  Generalized  Symmetric  Stochastic  Realization  Problem 

Let  X  •  Yfc+  and  Y  *  Y^  ,  then  Sxx  »  R+,  S^  -  R~,  S^  ■  H,  and  Syx  -  BT  are 
as  defined  previously.  Furthermore,  if  ve  let  the  (nxl)  state  vectors  of  the 
backward  and  forvard  realizations  be  as  defined  earlier,  i.e., 


vith  corresponding  diagonal  covariance  matrices  given  by 


E(zk-12\-1>  ■ 


(38a) 


E{VkT}  *  mT<r‘>m  “  \ 


(38b) 


T  T 

where  L  and  H  are  the  (n  x  •)  transformation  matrices,  then  the  generalized 
symmetric  stochastic  realization  problem  can  now  be  formulated  as  the  following 
constrained  optimization  problem: 


T  T  tr[LTHMMTHL] 

(PI):  Maximize  RV(L  Y,  |H  T.  )  ■  - m — - - — — — *-• r-™- 

L,M  K  *  l tr(LX(R+)Lj  tr(MT(R“)Mrr 


T  + 

Subject  to:  L  (R  )L  ■  A 

2 

MT(R-)M  -  A 


(39) 


By  introducing  Lagrange  multipliers  (X^,  X2,  XR)  and  (^,  +2,  •••»  +n),  we 

can  transform  (PI)  into  the  following  unconstrained  optimization  problem 


(P2):  Maximize  +(L,M) 


T  T  T  n 

tirCL  HMM  H  L]  -  Y 


X<[LT(R+)LL< 


Then  upon  taking  the  derivative  of  t(L,M)  vith  respect  to  L  and  M  and  equating 
them  to  zero,  followed  by  some  algebraic  manipulations,  we  get  the  following 
optimality  conditions  [10] 


B(R“)-1HTL  -  (R+)LT  -  0  (41a) 

BT(R+)'1HM  -  (R")Mr  -  0  (41b) 

where  T  is  the  (nxn)  diagnonal  matrix  of  squared  canonical  correlation 

coefficients.  In  addition  to  (41),  we  have  the  following  identities  [10],  [31] 


4  A  •  4  T 
z  x 


(42a) 


a  *  4  r 

X 


(42b) 


t  ■  4  r 

z 


(42c) 


where  A  -  diag  [A^,  ...»  AnJ  and  t  -  diag  [^,  ...»  ♦  ].  Notice  how 
(42)  displays  the  symmetry  properties  of  RV(LTTk+,MTYk").  As  a  result  of  this 
symmetry,  we  can  formulate  the  solution  as  given  by  (41),  into  the  format  of  a 
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weighted  singular  value  decomposition  [47],  [48],  which  in  turn  leads  to  the 
following  canonical  decomposition  for  H 


a  /o+u  A  -1/2-1/2.  -1/2uT/tj-v 
H  -  (R  )LAz  T  Ax  M  (R  ) 


-  [ (r+)la  _1/2r1/4] [r1/4d  '1/2mt(r“) ] 

Z  X 


where  0  and  C  are  respectively  the  observability  and  controllability  matrices. 
Upon  solving  for  L  and  H  from  the  respective  expressions  for  Q  and  C  in  (43), 
and  substituting  these  into  the  state  equations,  we  finally  get  the  desired 
solution,  i.e., 


2k-l  *  *z  T  0  <R  )  Tk 


(44a) 


Xk  "  Ax1/2r1MC(R")"1Tk' 


(44b) 


T  T  - 

Then  for  selected  values  of  dz  and  &x,  the  optimal  value  of  RV(L  ,M  Yk  )  is 
given  by 


RV(LTYk\MTYk')- 


2  nl/2 


lirwwsK«'vw.^T(w  w^wwFjnuwjra^Jiww wn.'mmwvi v  w  -.■< 
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r 
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t 
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It  is  vorth  mentioning  at  this  point  that  the  same  problem  for  fixed  &z  and 
Ax,  was  originally  solved  by  Akaike  [13]  from  a  canonical  correlation  analysis 
point  of  view,  and  was  later  algorithmically  extended  by  Oesai  and  Pal  [8].  Ve 
will  show  next  that  our  approach,  although  different,  is  general  in  the  sense 
that  it  covers  these  previous  results  as  specific  cases.  This  is  due  to  the 
flexibility  in  the  choice  for  Az  and  Ax> 

NORMALIZED  AND  BALANCED  STOCHASTIC  REALIZATIONS 

Here  we  will  make  use  of  the  flexibility  in  the  choice  for  Az  and  Ax  in 
order  to  characterize  different  Markovian  representations.  It  is  a  known  fact 
in  realization  theory  that  equivalent  minimal  Markovian  realizations  are  related 
by  a  similarity  transformation.  Thus,  if  we  restrict  A  and  A  to  be  either  the 

Z  X 

identity  matrix  or  some  function  of  T,  then  we  can  define  a  set  of  coordinate 
systems  which  give  essentially  a  unique  system  representation  under  sign  changes 
of  the  basis  vectors.  The  following  Lemma  characterizes  the  existence 
conditions  for  these  coordinate  systems. 

Lemma  1 


Given  a  canonical  factorization  for  B  as  in  (43)  with  A  and  A  satisfying 

Z  X 

(38),  if  we  let  A  »  I  (or  A  ■  I  ),  then  the  optimal  level  for  A  (A  )  in  the 

x  n  z  n  z  x 

RV-coefficient  sense  is  reached  at  A  »  T  (A »  T). 

Z  /v 

proof:  From  the  definition  of  the  RV-coefficient,  we  know  that 
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R 


i  («,  Y.j  i„ ) 

1*1  Z1  1  X1 


C(Z  *2)(?  «2  n1/2 

i*l  zi  1*1  xi 


<  l 


(46) 


Multiplying  (46)  by  its  denominator  and  collecting  terms,  ve  get 


n  2  \ / r  »2  ,,1/2 


A  a  l  (5  5  )  -  C(J  «z  HI  <5X  )  3 

T-l  2i  1  X1  7*1  Z1  1*1  *1 


(47) 


Then,  as  ve  increase  A,  the  RV-coe££icient  approaches  its  upper  bound.  Hence, 


ve  need  to  maximize  A.  Let  us  fix  A^,  then  taking  the  derivative  of  A  vith 


respect  to  A  and  equating  it  to  zero,  ve  get  the  folloving  optimality 
z 


conditions 


3A 


TTa  l  Vx  -  <Z  U 

3Az  1*1  1  xi  T-i  zi 


"i  5p,/2 

1-1  xi 


n  2 

u.A 


(48a) 


3A 


Similarly,  fixing  A  and  operating  on  3A  -  0,  ve  get 

Z  X 


_  n 


|f-  *  I  T .1,  -  (J  «.  ) 

,4X  f-1  1  Z1  f*l  1 


2  1 


n  - 

L  U<i 


1/2 


(48b) 
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Suppose  further  that  A 


In  in  (48a),  then  one  can  verify  that  the  optimal 


choice  for  &2  is  to  make  it  equal  to  T.  Similarly,  let  Az  -  In  in  (48b),  then 


*  T  corresponds  to  the  optimal  choice,  and  the  lemma  follows. 


The  following  definition  extends  the  normalized  realizations  of  Moore  [49] 
to  the  stochastic  case.  This  is  possible  from  Lemma  1  which  shows  the 
optimality  of  these  in  the  RV-coef ficient  sense. 


uelini tion  1 


A  stochastic  realization  for  A(k)  is  input  -  normal  if  and  only  if  Az  »  I 


and  Ax  -  T;  output-normal  if  and  only  if  Ax  -  In  and  Az  -  T;  and  balanced  if  and 


only  if  Az  -  •  T 


1/2 


The  results  of  Lemma  1  and  Definition  1  lead  us  to  the  following  Theorem 
for  characterizing  normalized  and  balanced  stochastic  realizations  from  the 
canonical  decomposition  of  H. 


Theorem  1 


There  exist  nonsingular  transformation  matrices  T^,  for  i-1,2,3  such  that 
the  state  vectors 


i  T  -TnT,B+.-l_  + 
2  k-1  ”i  ® 


*‘k  ■ 
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1 


I 


3 


3 


A 


•  J 


4 


M 


A 


i 


* 


a 


4 


>1 


s 


VI 


i 


Vi 


* 

(49a) 

,s 

(49b) 

. 

si 

a 

!«_»•*. 11 1 


are  transformed  under  the  rule 


(?) 


for  j»  0,1,2  and  i-j+1 


into  the  following  realizations: 


a)  i«l;  input-normal 


b)  i»2;  balanced 


c)  i-3;  output-normal 


Furthermore,  the  basis  vectors  for  these  coordinate  systems  are  aligned  and 


differ  only  by  scale  factors: 


m  T»—  1  / 

L1  =  L  l2 


(51a) 


t1/4t. 


(51b) 


jroof:  Let  P,1  .  d  1/2r“1M  and  P*  -  aJ l/2r*1/4  for  i  -  1,2,3.  Then  from 

Z  Z  XX 


Lemma  1  and  Definition  1,  one  can  verify  that  P 


r"1/4>  C 


_l/4  p  3 
1  ’  z 


r1/4,  and  P  3  -  r"1/4.  If  ve  now  let  A  -  A  -  T1/2,  then  P  2  -  P  2  -  I  .  This 
x  z  x  z  x  n 


_T  i  i 

shows  that  T.  ■  P  and  T.  ■  P  both  satisfy  (50).  Furthermore,  T.  is  a 

1  Z  1  X  X 


diagonal  matrix  with  positive  elements  only,  thus  invertible.  Finally,  (51) 


follows  easily  from  the  fact  that  Tj  -  In-  This  completes  the  proof. 


Interestingly,  the  normalized  stochastic  realizations  not  only  have  proved 


to  be  optimal  in  the  RV-coeff icient  sense,  but  also,  along  with  the  balanced 


stochastic  realizations  [8],  they  share  the  same  generic  property  found  in 


deterministic  realization  theory.  This  property  was  first  introduced  by  Moore 


[49]  for  deterministic  systems,  and  since  it  exactly  carries  over  for  stochastic 


systems,  we  will  omit  the  details  here.  In  addition,  these  stochastic 


realizations  are  unique  modulo  a  sign  matrix  if  the  canonical  correlation 


coefficients  are  distinct,  whereas,  for  scalar  processes,  a  sign  symmetry 


property  is  observed.  Ramos  [10]  has  recently  shown  the  existence  of  a 
cross-Ricatti  equation  for  scalar  or  symmetric  stochastic  realizations  (this 
type  of  symmetry  is  related  to  the  condition  A(k)  *  A(-k)  and  should  not  be 
confused  with  symmetry  in  the  sense  of  (42)).  This  cross-Ricatti  equation  is 
the  stochastic  counterpart  to  a  deterministic  cross-Gramian  equation  introduced 
by  Fernando  and  Nicholson  [50]  and  further  studied  in  [51],  [52].  Further 
properties  of  balanced  stochastic  realizations,  hence,  normalized  stochastic 
realizations  because  of  their  generic  property,  can  be  found  in  [8],  [48],  [53], 
[54]. 

As  a  final  remark,  it  is  worth  mentioning  that  when  RV ( , M^Y^- )  =  1, 
the  state  vectors  are  related  to  one  another  by  a  rotation  and  a  scale  factor, 
i  .e. , 


k-1 


QtPX, 


(52) 


where  a  is  a  nonzero  scalar  and  P  is  an  orthogonal  matrix.  This  follows  from 
properties  3,  5,  and  6  in  the  definition  of  the  RV-coef f icient  and  implies  that 
zk-l  ^as  a  r®versibility  property.  Interestingly,  Anderson  and  Kailath  [14] 
introduced  this  property  for  self-dual  forward-backward  Harrovian  pairs.  A 
condition  for  this  self-duality  is  that  the  autocovariance  function  of  {y^} , 
A(k),  is  symmetric.  This  is  precisely  the  condition  for  the  existence  of  the 
cross-Ricatti  equation  found  by  Ramos  [10]. 
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C.  The  Generalized  Unsyrametric  Stochastic  Realization  Problem 


Canonical  correlation  based  measures  are,  in  general,  invariant  under 
nonsingular  transformations  of  Yk+  and  Yk~.  This  property,  although  indirectly 
used  by  Desai  and  Pal  [8]  to  prove  uniqueness  of  the  canonical  correlations  and 
that  realizations  obtained  from  CRA  are  unique  modulo  a  sign  matrix  if  the 
canonical  correlations  are  distinct,  is  not  always  a  desirable  one.  This  stems 
from  the  fact  that  the  total  variance  of  a  set  of  variables  is  not  invariant 
under  all  nonsingular  transformations  of  the  variables,  but  only  under 
orthogonal  transformations.  For  instance  if  L  is  an  orthogonal  matrix,  then 
RV(LTYk+,MTYk~)  m  RV(Yjt+,MTY^“)  can  be  interpreted  as  the  ability  of  Yk~  to 
predict  a  linear  combination  of  Y^+  which  accounts  for  a  large  proportion  of  the 
total  variance  of  Y^+.  This  situation  has  appeared  recently  in  the  work  of  Arun 
and  Rung  [9],  who  introduced  the  Karhunen-loeve  method  (KLM)  (see  also  [39]  - 
[41],  where  the  same  method  is  called  the  unweighted  principal  component 
algorithm  (UPC))  for  obtaining  a  foward  Markovian  representation.  The  idea  here 
is  to  optimally  approximate  the  information  interface  between  Y^”  and  Yk+  vi  a 
one-sided  Karhunen-Loeve  expansion  of  the  forward  predictor  space.  We  will  next 
show  that  this  corresponds  to  a  more  general  problem  in  the  RV-coef f icient 
framework. 

In  the  case  of  obtaining  only  a  forward  Markovian  representation  such  as 
T 

Xfc-M  Y^  ,  we  solve  the  following  optimization  problem: 


(P3) :  Maximize  RV(lTYk+,MTYk") 


tr[LTHMMTaTL] 

[tr[lj(R+)L]2tr[M^(R')M]V72 


Subject  to:  L  L  -  I 


(53) 


MT(R~)M  -  A 


■i 
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Notice  that  L  does  not  alter  the  form  of  the  solution  since  it  is  an  orthogonal 
matrix,  however,  we  will  retain  it  to  better  illustrate  the  properties  of  the 
solution  and  its  connection  with  that  of  the  symmetric  problem.  Again,  if  we  i- 
ntroduce  Lagrange  multiplier  matrices  Ax  -  diag  (Ax^,  Axj,  ...»  Axn)  and  Y  - 
diag  +2  *  ’ 


,  Ynj ,  then  we  are  led  to  the  maximization  of 


*(M,L)  .  tr[HMMTHT]  -  1  x  [MT(R’)M]..  - 

i  =  l  X1 


whose  optimality  conditions  are  given  by 


3<KL,M)  -  HMMTHTL  -  LY 
3L 


3<KL,M)  -  HTLLTHM  -  (R“)MA.  -  0 
3M 


1  =  1 


(54) 


(55a) 


(55b) 


After  some  algebra,  one  can  show  that  L  is  given  by  the  solution  to  the 
following  eigenvalue  -  eigenvector  problem 


H(R~)_1BTL  -  LA 


(56) 


where  A .  is  the  diagonal  matrix  of  eigenvalues  in  descending  order  of  magnitude. 
From  (55)  and  (56),  the  solution  for  M  can  be  obtained  as 


M  -  (R')_1HTLAx“1/2Ax1/2 


(57) 
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and  the  optimum  RV-cofficient  is  given  by 


T  +  t  + 

where  L  (R  )L  is  a  general  (nxn)  matrix,  therefore  since  L  is  not  a  basis 

for  ■  spanlT^- |Tjt+ J ,  it  cannot  be  considered  a  state  vector.  On  the  other 

hand,  if  ve  maximize  RV(LTTk+,MTTjt~)  with  MTM-In  and  (38a)  as  contraints,  then 

^  =»  L  corresponds  to  a  backward  state  vector,  but  H  T^"  looses  the  state 

vector  property. 

Finally,  from  (60)  one  can  see  that  different  choices  for  Ax  leads  to 
different  Markovian  realizations.  In  particular,  if  we  let  Ax  ■>  Ax,  then  (60) 
is  equivalent  to  the  state  equation  obtained  by  Arun  and  Rung  [ 9 ]  via  KLM. 


D.  Unification  of  Coordinate-Free  Stochastic  Realization  Algorithms 

Most  of  the  stochastic  realization  algorithms  that  yield  canonical 
Markovian  models  (also  known  as  coordinate-free  representations)  are  special 
cases  of  the  two  general  problems  solved  earlier  in  this  section.  The  main 
differences  rest  upon  the  type  of  constraints  used  and  the  particular  choice  for 
Ax  and  Az>  In  Figure  1,  we  present  a  hierachy  of  solutions  to  the  stochastic 
realization  problem  from  an  RV-coef ficient  point  of  view.  Here,  we  start  with  a 
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general  RV-type  maximization  problem  and  depending  on  the  constraints,  one  can 


move  towards  a  symmetric  or  unsymmetric  solution.  These  are  then  classified  by 


algorithms  or  type  of  solutions  according  to  the  choice  for  the  state  covariance 


matrices. 


It  should  be  noted  that  in  all  cases  the  parameters  (F,G,H)  and  (E,Q,R,S) 


are  obtained  from  the  canonical  decomposition  of  the  Hankel  matrix  H  (i.e.,  (43) 


for  the  symmetric  stochastic  realization  problem  and  (59)  for  the  unsymmetric 


problem)  as  follows  (8]  -  [10] 


n  -  rank  [H] 


(62a) 


E  «  A 


(62b) 


(F,G,H)n  -  ([CC#  -  oV],  [CE],  [ET0])n 


(62c) 


(Q,R,S)  -  ([E-FEFT],  [A(0)-HEHT],  [G-FEH1]) 


(62d) 


where  "t"  and  are  respectively  the  shift-up  and  shift-left  operations  on 


block  matrices,  denotes  the  pseudo  inverse  for  non  square  matrices,  and 


E  *  ^m'  °*  °*  ‘ * 


To  show  how  different  choices  for  the  state  covariance  matrices  lead  to 


different  solutions,  consider  the  symmetric  stochastic  realization  problem  with 


Ax  -  In  and  let  Ax  ■  T  be  the  maximum  state  covariance  matrix  of  a  forward 


anti-filter  model  of  the  form 


*  „  *  * 
x  k+1  *  Fxk  +  vk 


(63a) 


*k  "  Hxk*  +  vk* 


(63b) 


if  if  it  —  1 

where  w^  and  are  white  Gaussian  noise  processes.  Then  since  x  ^  -  Az  z^ 
and  A*  -  A  [4],  A  -  r~*  must  be  satisfied,  thus  leading  to  the  solution  by 
Akaike  [13].  By  similar  arguments  one  can  verify  that  Az  -  Ax  -  In  correspond 
to  the  state  covariance  matrices  in  Baram's  algorithm  [11].  The  solutions  by 
Oesai  and  Pal  [8]  and  Arun  and  Rung  [9]  follow  easily  from  our  previous  results. 
Since  the  algorithms  are  mainly  related  by  scale  factors,  the  generality  of  the 
RV-coefficient  approach  is  immediately  noted. 

The  use  of  different  RV-coefficient  statistics  (see  Figure  1)  as  a  measure 
for  comparison  between  symmetric  and  unsymmetric  stochastic  realization 
algorithms,  as  well  as  a  tool  for  model  reduction,  is  illustrated  by  Ramos  [10] 
for  a  set  of  streamflow  data  from  the  Nile  River.  His  results  indicate  that  the 
unsymmetric  stochastic  realization  problem  is  more  stable  in  terms  of  the  rank 
of  the  Hankel  matrix,  resulting  in  smaller  reduced-order  models  as  well  as 
better  RV-coefficient  performance. 

However,  for  prediction  purposes,  both  approaches  led  to  similar  results. 
These  results  will  be  published  in  a  separate  paper  on  stochastic  model 


reduction. 


IV.  A  DIRECT  APPROACH  FOR  OBTAINING  COORDINATE-FREE  MARKOVIAN 


REPRESENTATIONS 


A.  Symmetric  Case 


Here  ve  assume  that  a  forvard-backvard  pair  of  Markovian  models  of  the 

innovations  representation  type  are  given  and  characterized  by  x*k  and  z*k,  with 

respective  state  covariance  matrices  P  and  N  (not  diagonal)  satisfying  a  pair  of 

algebraic  Riccati  Equations  (see  [4],  [48],  [54]  for  such  construction).  Ve  are 

then  interested  in  finding  a  similarity  transformation  of  the  form  -  Tx^  and 
-T 

zk  -  T  z^^  such  that  in  the  nev  coordinate  system,  the  state  covariance 

matrices  satisfy  the  properties  of  Definition  1. 

T  T 

Let  x^  -  M  x^k  and  z^  -  L  z^^  be  the  transformed  state  vectors.  Then,  ve 
vish  to  find  L  and  M  such  that  x^  and  zfc  ^  are  as  close  as  possible  in  the 
RV-sense.  This  amounts  to  solving  the  folloving  RV-optimization  problem. 


T  T  tr[LTNPMMTPNM] 

(P4):  Maximize  RV(L  z  .  .,  Mx,  )  .  - * - , - - - *-T7* 

L,M  *  [tr[LTNL]2tr[MTPM]2]1/2 


Subject  to:  LTNL  -A  (64) 

Z 

MTPM  -  Ax 
T 

where  E[zAkl  x  *k]  ■  NP  has  been  shovn  in  [55].  If  ve  let  Z  -  NP,  then  the 

—  -1/2  -T/2 

solution  corresponds  to  a  singular  value  decomposition  of  Z  »  N  ZP  ,  which 
leads  to 


.  -1/2-1/2 .  -1/2..T- 


/  /  e  \ 


where  i  is  again  the  diagonal  matrix  of  canonical  correlations  [48).  The 
correspondence  between  L  and  H  is  shown  to  be 


.  T  A  1/2--1/2 .  -1/2MTJTM-1 
L  ■  A  T  A  M  *  N 

Z  X 

MT  .  A  1/2r~1/2A  -^Vsp-1 
X  z 

from  which  the  following  identity  is  immediate 
LT2M  -  A  1/2r1/2A  1/2 


(66a) 

(66b) 


(67a) 


Now,  since  Z  ■  NP,  from  (65)  one  can  see  that 


.  A  -1/2-1/2.  -1/2UT  T 
LLz  1  Ax  H  mln 


(67b) 


T 

must  be  satisfied.  Furthermore,  if  we  recall  that  M  -T,  then  for  any  choice  of 

-T 

A  and  A  satisfying  the  properties  of  Definition  1,  the  identity  L  -  M  holds, 
X  z 

which  tells  us  that  applying  the  transformation  L  and  H  simultaneously  is 
equivalent  to  applying  a  similarity  transformation  T  to  both  the  forward  and 
backward  models.  Ve  now  have  the  following  theorem  which  establishes  this  fact. 


Theorem  2 

Given  a  canonical  decomposition  for  Z  as  in  (65)  with  L  and  M  satisfying 
the  constraints  in  (64),  if  we  apply  a  transformation  of  the  form 


j  »  0,1,2  and  i  •  j+1 


T  -  r^A  -1/2mt  -  r1/4 A  1/2l-! 

X  Z 


(68) 


to  the  pair  of  Markovian  models,  then  the  normalized  and  balanced  stochastic 
realizations  are  characterized  by 


>riv".i  '^yyj  ^r.\^r~.~.--.  ■'.  •'•  <■-■•■• 


*1  -  V 

*k 

(69a) 

» 

ZK1  '  Ti' 

where 

Tz 

z*k 

(69b) 

> 

\ 

1 

a)  i  «  1: 

input-normal 

1 

f 

H 

Si 

b>  i  -  2: 

balanced 

s, 

•- 

c)  i  *  3: 

output-normal 

I 

8 

i 


Furthermore,  the  tranformations  T^  differ  only  by  scale  factors,  i.e.. 


.-l/4_ 

l2 

(70a) 

.l/4_ 

a2 

(70b) 

proof:  Follows  easily  fror  (67),  Definition  1,  and  Lemma  1. 

Remark.  1 :  This  direct  approach  is  equivalent  to  artificially  symmetrizing  a 

2  -1 

system  of  the  form  3  »  TA  T  via  a  singular  value  decomposition  [48],  [54]. 

Remark  2 :  For  deterministic  systems,  the  equivalent  approach  would  be  to 

maximize  RV(L^O^,  M^C)  subject  to  L^VoL-dQ  and  M^VcM»dc,  where  VQ  and  are 
respectively  the  observability  and  controllability  gramians  [56].  This 

establishes  a  tri-equivalence  between  deterministic  balancing,  stochastic 
balancing,  and  the  symmetric  stochastic  realization  problem,  in  terms  of 
solution  strategy. 
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B.  Unsymmetric  Case 


A  direct  approach  is  also  possible  for  the  unsymmetric  problem  by  simply 


solving  the  following  RV  optimization  problem 


T  *  T  «(LTE  HKTE  U 

<P5):  Maximize  RV(LrT.+,  m'x*.  ) .  - 

L,M  K  [ tr[Ll (R+)L]Ztr[M1PM]/]i/Z 


Subject  tos 


T 

LiL  -  I 


MTPM  -  A. 


.here  Eyx  -  Ed^). 


Following  the  same  Lagrangian  optimization  procedure  as  in  the  previous 


problems,  one  can  easily  show  that  the  solution  is  given  by  solving  the 


following  eigen-system 


VM  v " L* - 0 


(72a) 


l  LL  I  M  -  PMA  -  0 
xy  yx  x 


(72b) 


from  which  the  following  identity  is  observed 


f  «  4  A 
x  x 


The  solution  for  M  is  then  given  by 


"T  ■ 


Now,  notice  that  x ^  -  Efx^ll^”}  -  C(R~)”*Yk“,  where  x^  is  any  state  vector 

satisfying  (1),  then  E  «  H(R~)~*C^  -  OP  and  (74)  simplifies  to 

yx 


HT  -  dx1/2Ax-1/2LT0 


(75) 


which  transforms  the  state  vector  into 


A  1/2.  -1/2. Tn 
\  -  4x  Ax  1  0x*k 


.  1/2.  -l/2IT_,_-.-l_  - 
\  Ax  L  H(R  )  Yk 


One  can  see  that  (71)  is  the  same  state  equation  obtained  for  the  generalized 

T  - 

unsymmetric  stochastic  realization  problem  (see,  e.g.,  x^-M  Y^  using  M  from 
(57)). 

This  last  result  completes  the  equivalence  between  the  stochartic 
realization  problem  and  a  direct  approach  for  obtaining  coordinate-free 
Markovian  representations,  initiated  by  Desai  and  Pal  [8]. 


V.  CONCLUSIONS 

The  need  for  introducing  a  common  statistical  measure  of  information  as  a 
tool  for  comparison  between  different  stochastic  realization  algorithms  has  led 
to  a  unification  of  Akaike's  stochastic  realization  theory  to  handle  other  forms 
of  multivariate  analysis.  This  has  been  possible  via  Robert  and  Ecoufier's 
novel  RV-coef f icient  approach,  which  leads  to  a  natural  interpretation  for  the 
stochastic  realization  problem  and  can  be  used  to  advantage  in  the  following 
context:  1)  for  algorithmic  development,  2)  for  performance  evaluation,  and  3) 
for  model  approximation.  Two  types  of  problems  have  been  identified  and  solved 


from  an  RV-coef f icient  point  of  view,  namely,  the  generalized  symmetric 
stochastic  realization  problem  and  its  unsymmetric  version.  Previous 
algorithms,  which  include  among  others,  the  canonical  realization  algorithm 
(CRA)  and  the  Karhunen-Loeve  method  (KLM),  are  shown  to  be  particular  solutions 
to  these  general  problems.  In  each  case,  an  RV-coef f icient  statistic  has  been 
presented  based  on  the  different  choices  for  the  state  covariance  matrices. 

The  normalized  realizations  found  in  deterministic  realization  theory  have 
been  carried  over  to  the  stochastic  case,  shoving  optimalilty  in  the 
RV-coeff icient  sense.  Alternatively,  the  problem  of  finding  a  similarity 
transformation  that  brings  a  pair  of  innovations  representations  to  a  certain 
canonical  form  (coordinate-free)  has  also  been  tackled  from  an  RV-coef f icient 
point  of  view.  This  again  can  be  formulted  as  two  general  problems  which  bear 
resemblance  with  the  generalized  symmetric  and  unsymmetric  stochastic 
realization  problems.  Furthermore,  a  parallelism  between  deterministic 
balancing,  stochastic  balancing,  and  the  generalized  symmetric  stochastic 
realization  problem,  from  a  solution  standpoint,  has  been  established. 

Finally,  we  remark  that  the  RV-coefficient  approach  provides  us  with  a  rich 
theory  for  solving  a  large  class  of  multivariate  problems.  In  particular,  areas 
such  as  signal  processing,  random  fields,  discriminant  analysis,  and  pattern 
recognition,  to  name  only  a  few,  should  benefit  from  this  analytical  tool. 
Preliminary  results  addressing  this  issue  have  already  been  presented  in  [22], 
(23),  where  an  abstract  representation  of  the  RV-coefficient  has  been 


introduced. 
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I.  INTRODUCTION 


In  a  recent  series  of  papers  [1]  -  [3],  a  cross  Grammian  matrix  Wcq  vas 

introduced  for  both  continuous  and  discrete-time  SISO  linear  dynamical 

systems.  This  matrix,  which  is  the  product  of  the  controllability  and 

observability  matrices,  i.e.,  V  -  CO,  shares  some  interesting  properties 

which  are  of  concern  in  linear  systems  theory.  One  such  property  is  that  if 

V  is  of  full  rank.,  then  the  system  is  observable,  controllable,  and  minimal, 
co 

It  has  also  been  shown  [1]  that  V  can  be  determined  from  the  solution  to  a 

co 

2 

matrix  Lyapunov  equation,  and  further,  that  v  co  ■  V  Vo,  where  and  Vq  are 
respectively  the  controllability  and  observability  Grammians.  In  [4],  these 
results  were  extended  to  MIMO  systems  having  a  symmetric  transfer  function 
matrix  (equivalenty  symmetric  Markov  parameters).  More  recently  [5],  the 
definition  of  the  cross  Grammian  has  been  extended  to  a  more  general  class  of 
transfer  function  matrices  referred  to  as  orthogonally  symmetric  transfer 
function  matrices.  Other  properties  of  Wcq  are  given  in  [6]  and  its 
connection  to  balanced  realizations  have  been  reported  recently  in  [2],  [3], 
[7]. 


\ 


In  this  note  we  introduce  a  cross  Riccatian  matrix,  IpN,  for  MIMO  linear 
dynamical  stochastic  systems  with  symmetric  Markov  parameters.  This  type  of 
symmetry  arises  naturally  in  SISO  systems  such  as  univariate  ARMA  and 
state-space  models,  as  well  as  in  MIMO  systems  representing  electrical 
circuits  and  pover  networks.  Other  symmetry  properties  similar  to  those  for 
Vco  are  introduced,  including  the  solution  to  a  cross  Riccati  equation  and  its 
connection  to  balanced  stochastic  realizations. 
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II.  PRELIMINARIES.  FORVARD  -  BACKVARDS 
STOCHASTIC  REALIZATIONS 


Here  ve  are  interested  in  stochastic  processes  that  have  a  finite 
dimensional  Markovian  (state-space)  representation.  Thus,  suppose  ve  can 
represent  a  given  m-dimensional  zero-mean  stationary  stochastic  process  {y^} 
by  a  forward  state-space  model  of  the  form 


where  is  the  (nx  1)  state  vector,  y^  is  the  (m  x  1)  output  vector,  and 

{v^J  and  {v^}  are  zero-mean  white  Gaussian  noise  processes  with  joint 
covariance  matrix 


Furthermore,  v^  and  v^k  satisfy  the  forward  propagation  property 

E{xk(w£s)T)  "  2{xk(vfs)Tj  -  0,  s  >  k  (3) 

Ve  assume  that  the  parameter  triple  (F,G,H)n  is  minimal,  i.e.,  observable  and 

controllable  such  that  the  state  covariance  matrix  and  its  inverse  exist, 
T 

i.e.,  I  -  Etx^x  k)  is  the  unique  positive  definite  solution  to  the  following 
matrix  Lyapunov  equation  [8] 


It  is  well  known  [8]  that  the  output  autocovariance  natrix  A(k)  is 


parametrically  represented  by 


A(k)  -  IHFk"1G]l(k)  ♦  [GTF("k-1)THTJl{k)  +  A^)^ 


where 


f  " 

lO  ot 


otherwise 


A(0)  -  BZH1  +  R 


G  -  S  -  FEH 


From  duality  one  can  also  represent  {y^}  by  a  backwards  state-space  model 
[9],  [10],  i.e., 

2k-i  •  p\  *  »bk 


k  -  0,-1, -2, . . . 


pk  ■  G\  *  v\ 


where  z^  is  an  (nxl)  state  vector  evolving  in  the  opposite  direction  of  time, 
and  (w^  ]  and  (v^J  are  zero-mean  white  Gaussian  noise  processes  with  joint 
covariance  matrix 


b  lb  ..  b 


vk  vs  ’  vs 


Qb  Sb 


Ks  >  °»  Qb  >  °-  Rb  >  0  (9) 


and  uncorrelated  with  z  for  s  >  k. 

s  — 

following  matrix  Lyapunov  equation 


In  addition,  £ 


E(Vk 


}  satisfies  the 


-1  T  -1 
E  .  F  I  F  ♦  Q, 
b 


(10) 


while  Rb  and  Sfa  satisfy 

(11) 
(12) 


Rb  -  A(0)  -  GTI-1G  -  R  +  HIHT  -  GTI~^G 

sb  -  eT  -  ftig 


A  pair  of  models  satisfying  (1)  -  (12)  is  called  a  forward-backwards  dual 
pair.  These  types  of  models  have  been  studied  recently  in  the  light  of 
balanced  stochastic  realizations  {9]  -  [12]  and  in  smoothing  problems  [13], 
[141. 


III.  SYMMETRIC  STOCHASTIC  REALIZATIONS 

Vhen  the  stochastic  process  {y^J  is  a  scalar,  or  when  given  the  model, 
the  triple  (F,G,H)n  yields  a  symmetric  stochastic  realization,  then  some 
interesting  properties  common  to  both  the  forward  and  backwards  model  take 
place.  These  properties  are  summarized  below  in  a  series  of  Lemmas,  along 
vith  their  proofs. 

Lemma  1: 

The  following  conditions  are  equivalent  for  a  scalar  or  symmetric 


stochastic  realization: 


i.  (F,GfH)n  is  symmetric 

ii.  A(k)  -  HF^'^G  is  symmetric  for  all  k  >  1 

iii.  R+  -  R"  -  <R+)1/2(R')T/2  -  R 

iv.  H  -  (R+)-1/2H(R")"T/2  is  symmetric 


where  R~  and  R  are  semi-infinite  toeplitz  covariance  matrices  with  respective 

first  rows  (A(0),  A(l),  A(2),...J  and  [A(0),  A^(l),  A*(2),...J,  H  is  a 

1  n 

semi-infinite  Hankel  matrix  with  first  row  [ A( 1 ) ,  A(2),  A(3),...],  (A) 

T 

denotes  the  matrix  square  root  of  A,  and  A  denotes  the  transpose  of  A. 


proof:  A  condition  for  symmetry  is  that  the  parameters  satisfy  the  following 

relation 


F  -  DFTD  and  H  -  GT0 


where  D  is  a  sign  matrix.  Substituting  these  into  A(k)  ■  HF  G,  we  get 


T  k-1  T  -IT 
A(k)  -  GaDD(F  VDD  V 


gV-W 

AT(k)  -  A(-k) 


where  DD  ■  1^.  From  (14),  properties  (1-4)  then  become  obvious. 


Lemma  2: 

Given  a  symmetric  stochastic  realization  characterized  by  the  triple 


-  A  (0) 


T  -1/2 
OTA  A/*«» 


[R'  -  CHTA-1(0)HC]1/2 


[R+  -  0GA_1(0)GT0T]1/2 


A-1/2(0)GT0T 


and  by  applying  the  matrix  inversion  Lemma,  along  with  the  symmetry  properties 


from  Lemma  1,  ve  get  the  desired  result. 


It  should  be  noted  that  the  above  cross  Riccati  equation  shares 


properties  from  both  a  forward  and  a  backyards  innovations  representation. 


Furthermore,  it  is  the  stochastic  counterpart  to  the  cross  Grammian  equation 


introduced  in  [1]  for  deterministic  systems. 


Lemma  3: 


For  a  symmetric  stochastic  realization  (F,G,H)n,  the  following  relations 


always  hold  true: 


E  PN  “  PN 


IpN  -DN 


(F,G,H)n,  if  P  and  N  are  positive  definite  solutions  to  a  matrix  Riccati 

equation  corresponding  respectively  to  the  fo.'vard  and  backyards  Kalman 

filter,  then 

1/2  T/2 

-  P  N  satisfies  a  cross  Riccati  equation  of  the  form 
EpN  -  FEpNF  ♦  [G  -  FEpNG] l A(0)  -  HZpNG]_1  [HT  -  FTEpNHT]  (15) 

proof :  Since  P  ■  C(R  )  and  N  =  0^(R+)  ^0  (12),  ve  have 
rpN  -  C(R")'1/2(R+)“T/20  -  C(R)_10 


or  equivalently,  by  making  use  of  the  semi-infinity  properties  of  0,  C,  R  , 
and  R+ , 


ZPN  “ 

G,  FC 

A(0) 

HC 

-1/2 

A(0) 

T 

GO 

-T/2 

H 

T  T 

C  V 

R~ 

OG 

R* 

OF 

. 

a 

L  J 

,  , 

-  [G, FC]  ]-l  I"1  f  1  (16) 

Al  0  A1  A4  H 

A2  A3  0  A5  OF 


proof:  from  property  (3)  of  Lemma  1,  ve  have 


C(R")~1CT0T(R+)'10 


C(R")-T/2I(R+)'1/20C(R~)‘T/2](R+)"1/20 


-  C(R)_1OC(R)~10 


PN 


.-1. 


and  recalling  that  IpN  ■  C(R)“  0  and  by  using  the  identity  C  D«0,  the  last  two 
relations  follow  easily. 


Lemma  4: 


A  necessary  and  sufficient  condition  for  a  symmetric  system  to  be 
observable  and  controllable,  hence  minimal,  is  that  rank  [rp^]-n. 


proof:  It  suffices  to  show  that  the  monzero  eigenvalues  of  the  weighted 


Hankel  matrix  H  are  also  the  eigenvalues  of  Let 


lrPN  -  ^nl  *  lC<R)"1()  “  “  0 


(20) 


b*'  the  characteristic  equation  for  IpN.  Then  it  follows  that 


|C(R)_10  -  XI  |  -  <-X)n-p|0C(R) 


-  1 


-  XI 


-  |(R) 

-  |h  -  xi 


-1/2B(R)-1/2  -  XI. 


(21) 


where  p>n  is  the  dimension  of  H  and  R.  Hence,  IpN  and  H  liave  the  same  nonzer 


eigenvalues. 


Nov,  since  rank[B]  -  rank[H]  «  n  is  a  necessary  and  sufficient 
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condition  for  minimality  (equivalently  observability  and  controllability), 
then  rank[IpN]  -  n  since  H  has  (p-n)  zero  eigenvalues.  This  completes  the 
proof. 


Lemma  5: 

The  Cauchy  index  of  a  symmetric  stochastic  realization  (F,G,H)n  is  given 
by  the  signature  of  the  cross  Riccatian  EpN> 


9 


proof :  The  Cauchy  index  is  given  by  the  sum  of  the  positive  real  minus  the 

negative  real  eigenvalues  of  the  Hankel  matrix.  Hence,  from  (21)  it  is  clear 


i 


that  this  is  equivalent  to  the  signature  of  EpN. 

The  conditions  of  Lemma  3  imply  that  the  computation  of  the  cross 
Riccatian  reduces  to  the  problem  of  determining  either  P  or  N.  Vhereas  Lemmas 
A  and  5  reveal  that  the  cross  Riccatian  contains  the  same  information  as  the 


Hankel  matrix  but  in  a  more  compact  form. 


Lemma  6: 


3 


The  eigenvalues  of  E  p^  are  invariant  under  similarity  tranformations  and 
equal  to  the  squared  canonical  correlation  coefficients  between  the 

past  and  future  of  the  stochastic  process  {y^l . 

T 

proof :  Let  T  be  a  similarity  transformation  such  that  (F,G,  H]n  - ♦ 

T  T 

ITFT-1  ,TG , HT-1 )  .  (F,G,H)n,  P - -»  TPTT  -  P,  and  N - -»  T"TNT_1  -  N  are 

the  similarity  relations  between  the  original  and  transformed  systems.  Then 


it  follows  that  E i 


2  -1  7 

-•*  TE  _.,T  =  E  ‘  .  are  similar  matrices;  therefore, 

PN  PN 


they  must  have  the  same  eigenvalues.  This  proves  the  first  part.  The  second 


part  follows  by  substituting  for  P  -  C(R)~^C^  and  N  -  0^(R+)*"^0  in  (17)  and 
making  use  of  property  (4)  of  Lemma  1,  followed  by  a  singular  value 
decomposition  of  H,  and  finally,  applying  some  simple  arguments  from  [12]. 

Lemma  7: 

Given  a  symmetric  stochastic  realization  characterized  by  the  triple 
(F,G,H)n»  if  the  system  is  transformed  to  any  one  of  the  following  balanced 
coordinates 


a) 

input-normal: 

P 

-  r  and  N  -  I 

n 

b) 

output-normal: 

P 

-  I  and  N  -  T 
n 

c) 

internally  balanced: 

P 

-  n  .  r1/2 

1/2  1/2  1/2 

where  T  «  diagfr^  '  y2  ’  ‘ '  *  ’  Yn  1  *s  t*le  diagonal  matrix  of  canonical 

'  '  2 

correlation  coefficients,  then  the  condition  E  pN  -  T  always  holds. 

proof:  Follows  by  inspection 

Lemmas  6  and  7  are  useful  in  balanced  stochastic  realization  theory 

[ 10 ] — [ 12 ]  since  finding  the  balancing  transformation  amounts  to  finding  an 

2 

eigenvalue  -  eigenvector  decomposition  for  E  p^.  For  symmetric  systems, 

2 

however,  one  can  use  the  properties  of  Lemma  3  to  find  E  p^  by  solving  only 
one  Riccati  equation  as  opposed  to  two  in  the  general  case.  Futhermore,  the 
results  of  Lemma  7  show  that  EpN  is  an  invariant  parameter  when  the  system  is 
in  balanced  coordinates.  Ve  finally  remark  that  the  above  symmetry  results 
are  the  stochastic  counterpart  to  the  results  in  |1)  -  |4]  for  determinstic 
systems. 


With  the  exception  of  [10],  very  little  work  has  been  done  on  symmetric 


stochastic  realizations.  Ve  hope  that  the  above  symmetry  results,  in  addition 
to  those  in  {10],  motivate  further  work  on  symmetric  balanced  stochastic 
realizations.  In  (7],  an  algorithm  has  been  developed  for  obtaining 
deterministic  balanced  realizations  from  the  solution  to  the  cross-Grammian 
equation.  The  authors  are  currently  investigating  the  possibility  of 
extending  this  algorithm  to  use  the  cross  Riccatian  for  obtaining  balanced 
stochastic  realizations. 

CONCLUSIONS 

Ve  have  defined  a  nev  cross  Riccatian  matrix,  Ep^,  which  contains 
properties  from  both  a  forward  and  a  backwards  innovations  representation.  It 
was  shown  that  IpN  satisfies  a  cross  Riccati  equation  which  is  related  to  the 
forward  and  backwards  Riccati  equations  by  a  sign  matrix.  For  balanced 
stochastic  realizations,  this  implies  some  computational  savings  since  one 
need  not  solve  a  pair  of  Riccati  equations  while  computing  the  balancing 
transformations.  Furthermore,  the  cross  Riccatian  matrix  and  the  associated 
Hankel  matrix  share  some  common  properties  which  arise  naturally  in 
realization  theory. 
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The  emphasis  of  this  paper  Is  on  the  geometrical  theory 
behind  the  stochastic  realization  problem,  and  Its  applica¬ 
tion  to  (stochastic)  model  reduction.  It  Is  shown  that 
known  stochastic  realization  algorithms  based  on  canonical 
correlations  or  principal  components  can  be  analyzed  In  a 
common  framework  using  certain  operator  valued  measures. 
This  analysis  Is  based  on  the  concept  of  the  RV-coeff Iclent 
as  Introduced  In  multivariable  statistics  as  a  measure  for 
the  similarity  between  two  sets  of  random  variables.  It  Is 
shown  that  the  theory  has  some  parallels  with  the  founda¬ 
tions  of  (geometric)  quantum  mechanics. 


INTRODUCTION 

The  stochastic  realization  problem  deals  with  the  quest  for  a  finite  dimensional 
Markovian  representation  for  a  stochastic  process  from  the  known  covariance  In¬ 
formation.  If  the  covariances  of  the  invervening  random  variables  are  exactly 
known,  then  we  deal  with  the  exact  stochastic  realization  problem,  which  has 
received  great  attention  11,3,4,5,10],  This  is  primarily  due  to  is  fundamental 
importance  in  system  identification,  digital  filtering,  signal  processing,  and 
time  series  modeling  (16].  For  many  applications  the  Markovian  ‘representation  or 
state  space  model  may  be  too  complex  due  to  its  high  dimensionality,  thus  barring 
efficient  computational  management. 

This  partially  motivates  the  search  for  smaller  dimensional  Markovian  realiza¬ 
tions  which  approximate  the  original  (or  exact)  one  in  some  sense.  The  high 
dimensionality  of  the  original  (exact)  state  space  model  can,  for  instance,  be 
caused  by  the  incorporation  of  weakly  coupled  superfluous  state  components. 
These  components  may  mask  any  underlying  physical  principles  or  tendencies  hidden 
in  the  dynamics.  , 

Another  difficulty  with  the  stochastic  realization  problem  is  the  necessity  of 
the  exact  covariance  information.  In  most  practical  situations,  all  one  has 
available  is  an  estimate  of  the  covariancee  based  on  the  real  data  (i.e.,  sample 
covariances).  Not  only  would  the  noise  fluctuations  in  the  covariance  structure 
lead  to  models  of  high  dimensions,  but  what  is  more  essential,  the  sample  covari¬ 
ance  sequence  may  not  be  positive-real.  In  such  a  case,  the  exact  realization 
algorithm  applied  to  inexact  data  may  not  have  a  solution  at  all  [3]. 

Akalke  has  developed  a  stochastic  realization  theory  based  on  the  information 
interface  between  the  past  and  the  future  of  a  time  series  and  the  concepts  of 
canonical  correlation  analysis.  Here  the  pair  of  canonical  vectors  with  positive 
canonical  correlations  form  a  minimal  interface  between  the  past  and  the  future 
of  the  process.  It  is  shown  that  these  two  canonical  vectors  are  the  states  of 
the  forward  and  backward  innovations  representation  (extreme  Markovian  represen¬ 
tations)  introduced  by  Faurre  (10)  and  are  also  basis  vectors  of  what  Akalke 
calls  the  forward  and  backward  predictor  spaces,  respectively.  Moreover,  the 
canonical  correlations  coefficients  provide  a  rational  basis  for  obtaining  the 


reduced  order  model.  Baram  [4]  extended  Akaike's  result  to  the  nonstationary 
case  and  considered  the  model  reduction  problem  by  deleting  the  insignificant 
singular  values  from  a  singular  value  decomposition  of  the  Hankel  covariance 
matrix.  A  similar  algorithm  for  obtaining  the  stochastic  realization  and  reduced 
order  model  called  the  Canonical  Realization  Algorithm  (CRA)  was  introduced  by 
Desai  and  Pal  [51.  It  is  a  further  extension  to  Akaike's  work  by  introducing  the 
concept  of  balanced  stochastic  realizations.  Here  a  forward-backwards  dual  pair 
is  obtained  with  state  covariance  matrices  being  equal  and  diagonal  and  these 
diagonal  elements  are  the  canonical  correlation  coefficients.  A  forward-backward 
pair  that  satisfies  these  conditions  is  said  to  be  in  balanced  form.  These 
balancing  conditions  are  the  stochastic  counterpart  to  the  deterministic 
balancing  conditions  originally  proposed  by  Moore  [17]  and  for  the  time  varying 
case  by  Verriest  and  Kailath  [23] 

Recently  Arun  and  Xung  [3]  introduced  the  Karhunen-Loeve  Method  (KLM)  which  also 
has  its  grounds  on  Multivariate  statistics  and  in  the  context  used  here  is  shown 
to  be  equivalent  to  Principal  Components  of  instrumental  variables  as  discussed 
in  Rao  [21].  KLM  optimizes  the  approximation  of  the  information  interface  be¬ 
tween  the  past  and  the  future  of  a  stochastic  process  via  a  one  sided  Karhunen- 
Loeve  Expansion  (KLE)  of  the  predictor  space.  A  break  in  the  diagonal  elements 
of  the  state  covariance  matrix  dictates  the  reduced  order  model.  As  opposed  to 
CA,  KLM  is  not  symmetric  in  the  sense  that  either  a  forward  or  a  backwards 
Markovian  representation  is  obtained.  This  would  constitute  two  separate  prob¬ 
lems.  Arun  and  Rung  [3]  also  pointed  out  that  CRA  is  not  well  suited  for  model 
reduction  due  to  the  smallness  of  the  canonical  correlations  and  the  fact  that  it 
works  with  unapproximated  data.  Despite  the  theoretical  facts,  no  direct  compar¬ 
ison  between  KLM  and  CRA  has  been  reported  justifying  Arun  and  Kung’s  argument 
nor  has  there  been  any  statistical  measure  of  information  common  to  both  models 
that  would  aid  the  modeler  to  disciminate  against  CRA  and  KLM  when  dealing  with 
the  model  reduction  problem. 

Ramos  and  Verriest  explored  in  an  earlier  paper  [19]  the  combination  of  the 
canonical  correlation  and  principal  component  analyses,  given  the  exact  covari¬ 
ance  information,  in  a  common  framework  using  the  RV-coef f icient  introduced  by 
Escoufier  [6].  It  was  shown  that  this  common  statistical  measure  of  information 
provides  a  rationale  for  drawing  inferences  about  the  performance  of  the  algo¬ 
rithms.  In  the  RV-coeff icient  framework,  linear  transformations  on  sets  of  random 
variables  are  found,  so  that  the  transformed  sets  are  as  similar  as  possible  in  a 
certain  sense.  The  RV-measure  attains  values  in  [0,1]  and  the  closer  to  one,  the 
greater  the  similarity  of  the  sets  of  random  variables. 

The  motivation  for  this  paper  is  to  derive  some  more  geometrical  insight  in  the 
problem,  in  order  to  adapt  the  method  for  use  in  the  approximate  stochastic  real¬ 
ization  case,  based  on  real  data. 

In  the  following,  we  therefore  briefly  summarize  the  stochastic  realization  to 
set  the  necessary  background.  This  is  followed  by  a  brief  discussion  of  the 
canonical  correlation  and  principal  component  analysis.  Next,  the  RV-coeff icient 
is  introduced  in  a  geometrical  context,  for  the  exact  covariance  data  and  for  the 
real  data  cases.  A  unifying  framework  is  developed,  and  a  connection  is  made 
with  some  of  the  foundations  of  quantum  mechanics.  Finally,  the  RV-technique  is 
illustrated  for  the  CCA  and  PCA. 

TEK  DISCRETE  STOCHASTIC  REALIZATION  PROBLEM 

Given  the  covariance  sequence  A(k)  of  a  rational  stationary,  zero  mean,  discrete 
time  vector  sequence  {y^},  the  stochastic  realization  problem  consists  in  finding 
a  Markovian  representation  of  the  form 
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“  Pxk +  wk 

h  mHX*  +  vk 


here  {w,  }  and  { v.  }  ace  white  Gaussian  noises  with 


v;)  -[s-  R]  4k,t 1  vk- 


such  that  E(7k+ny^)  •  A(k).  6 (k , A)  is  the  Kronecker  delta. 

0 

The  solution  to  this  problem  is  well  known  [10).  Given  the  covariance  sequence, 
one  forms  the  (infinite)  Hankel  matrix 

/A(1)  A ( 2 )  \ 

H  -{  A(2)  A{3)  1  (4) 


The  time  sequence  is  cational  if  and  only  if  this  Rangel  matrix  has  finite  rank 
(say  n) .  It  follows  then  from  the  deterministic  realization  theory  [10]  that  the 
order  of  any  minimal  Markovian  representation  of  {y  }  is  precisely  n,  and  a 
triple  (P,G,H)  can  be  constructed  such  that 


A(k)  -  HPk_1G  ♦  Aflak0 
A(k)  -  A'(-k) 


k  >  0 


k  <  0 


where  in  order  to  have  a  Markovian  representation,  the  following  needs  to  be 
satisfied. 

P  -  FPF'  -  Q  (6) 

G  -  FPH*  -  S  (7) 

Aq  -  HPH'  -  R  (8) 


[?•  *]>0  •  p> 


Here  P  is  interpreted  as  the  state  covariance  matrix 

P  -  E (x^x^)  (10) 

The  triple  (P,G,H)  together  with  Aq  do  not  uniquely  specify  the  covariances 
P,Q,S,  and  However,  P  completely  specifies  Q,S,  and  R,  and  therefore  charac¬ 
terizes  the  Markovian  representation.  Furthermore,  note  that  any  minimal  reali¬ 
zation  of  the  covariance  sequence  is  unique,  modulo  a  similarity  transformation. 

In  the  stationary  sequence  realization  problem,  the  past  and  the  future  can  be 
brought  on  equal  footing,  since  the  statistical  properties  of  the  given  sequence 
are  invariant  with  respect  to  time  inversion.  There  are,  thus,  two  classes  of 
representations:  the  forward  and  the  backward  representations.  The  forward 

Markovian  representations  have  the  causal  structure;  or  forward  propagation  pro¬ 
perty: 

B(xtvs)  “  °  V8  >  t 

Similarly,  the  backwards  models  have  the  anti-causal  structure. 


E(xfcv^) 


Vs  <  t 


Denoting  by  n  the  set  of  state  covariances  defining  a  forward  Markovian  model 
with  triple  (F,G,H) ,  and  by  n  the  corresponding  set  of  state  covariances  for  the 


backward  models  with  triple  (F,G,H),  the  following  can  be  asserted  about  the 
sets  II  and  n.  Note  that  both  sets  only  contain  positive  definite  matrices. 


1. 


Both  sets  are  closed, 


bounded,  convex, 

* 

Pt  <  P  <  P 
P*  >  P# 

7,  <  7  <  p 
7*  >  7, 


and  have  two  extreme  points. 

for  n 

for  n 


2.  There  exists  an  order  isomorphism  (matrix  inversion)  between  the  partially 
ordered  sets  (H,<)  to  (H,>).  Thus, 


pen  <-> 

P-1e  n 

(11) 

* 

—  —  1 

p  - 

p 

* 

(12) 

-1 

p  - 

p 

* 

(13) 

It  can  be  shown  [1]  that  the  extreme  points  P#  and  P#  respectively  correspond  to 
the  forward  and  backward  innnovations-representations. 


APPROACHES  TO  MARKOVIAN  MODELING 

Assume  that  the  stochastic  time-series  {y  }  is  Gaussian  (with  zero  mean).  The 
relevant  random  variables  are  then  in  the  Hilbert  space  L2(H,B,P)  and  conditional 


expectations  can  be  interpreted  as  orthogonal  projections  onto  subspaces 

L2(n,4yk^,p). 


por  the  time-series  {y ^ }  define  as  in  (10] 


the  infinite  vectors. 

the  future  (14) 

the  past  (15) 


and  define  the  semi- inf inite  covariance  matrices 


a  ■  *(<(*;)') 

a*  ■  *(<(<)') 


(16) 

(17) 

(18) 


Within  this  reDresentation,  the  forward  and  backward  predictor  subspaces  are 
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(19) 


Xk  -  SpanO^  |  Y~ ) 

Zk_t  -  Span(Yk  |  Yk)  (20) 

( A I B)  denotes  the  projection  of  span  (A)  onto  the  Hilbert  space  spanned  by  the 
Components  of  B.  From  the  projection  theorem,  one  obtains 

-  H(R')_1Yk  (21) 

Zk-1  -  H1 (R+)-1Vk  (22) 

Under  the  assumption  that  a^  finite^  dimensional  Markovian  representation  exists, 
finite  dimensional  bases  Xk  and  zk  can  be  found  such  that  they  respectively 
generate  Xk  and  Zk,  i.e.,  there  exists  operations  A  and  B  such  that 

X*  -  A'Xk  -  A'H(R')-1Yk  -  M‘ Y~  (23) 


Z*_5  -  B’Zk  -  B’H’  (R’t‘)"1Yk  -  L'Y*  (24) 

Since  the  basis  vectors  must  be  linearly  independent,  their  covariances  must  be 
nonsingular.  We  may  Impose  the  constraints 

\  "  dia9(6*  >  (25) 


EVlCl  -  *2  "  dla9<SZi> 


by  suitably  redefining  the  A  and  B  (multiplication  with  an  othogonal  matrix  on 
the  right.)  The  stochastic  realization  problem  is  thus  equivalent  to  the  problem 
of  finding  matrices  L  and  M  such  that  the  similarity  between  the  predictor  spaces 
is  as  large  as  possible,  while  satisfying  the  constraints  (25)  and  (26) .  For  the 
stochastic  model  reduction  problem,  different  statistical  techniques  have  been 
proposed,  notably  the  canonical  correlation  method  and  the  principal  component 
analysis. 

The  application  of  the  Canonical  Correlation  Analysis  to  the  realization  problem 
has  been  pioneered  by  Akaike  (1],  Barara  (45,  and  Desai  and  Pal  (51,  who  tied  this 
in  with  balancing  techniques.  By  the  above  analysis,  the  state  is  in  fact  the 
Information  interface  between  past  and  future.  The  canonical  correlations  lead, 
therefore,  to  a  natural  distance  measure  between  the  past  and  the  future,  which 
in  the  Gaussian  case  is  nothing  else  than  the  Rullback-Leibler  mutual  informa¬ 
tion. 


Arun  and  Kung  [3]  used  a  different  approach.  In  (3),  the  past  is  treated  as 
instrumental  values  for  predicting  the  future.  (The  Principal  Component  Analysis 
is  also  named  Instrumental  Variable,  or  Karhunen-Loeve  method  [21]).  The  model 
reduction  method  is  then  based  on  retaining  the  components  of  the  past  that  have 
a  significant  contribution  to  its  efficiency  in  predicting  the  future. 

The  two  methods  are  obviously  not  equivalent,  and  therefore  problems  and  some 
critique  on  each  have  been  pointed  out.  First  of  all,  the  interpretation  of  the 
canonical  correlations  has  been  questioned  [3],  A  relatively  strong  canonical 
correlation  between  two  components  is  possible,  however,  they  may  not  extract 
significant  portions  of  the  (total)  variance.  Another  critique  is  that  the 
canonical  correlations  are  rather  small  numbers  to  compare,  since  they  must  ob¬ 
viously  belong  to  the  interval  (0,1].  This  is  probably  not  such  a  sharp  disad¬ 
vantage,  since  only  relative  magnitudes  are  important  anyway.  Finally,  since  the 
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canonical  correlation  technique  is  an  exact  covariance  realization  technique,  and 
only  sample  covariances  are  available  in  any  real  situation,  the  robustness  of 
the  realization  procedure  may  be  at  stake.  On  the  other  hand,  the  proponents  of 
the  canonical  correlation  analysis  point  out  the  nice  way  in  which  the  connection 
between  Y-  and  Y  is  displayed.  This  paper  will  hopefully  resolve  such  a  dispute 
between  opponents  and  proponents  of  either  method,  by  giving  a  common  framework 
which  puts  both  analyses  on  equal  footing.  As  was  already  remarked  in  the  intro¬ 
duction,  the  analyst  can  then  decide  which  features  are  important  for  the  situa¬ 
tion  at  hand,  and  make  an  appropriate  decision. 


STATISTICAL  PRELUDE:  THE  GEOMETRIC  NATURE  OF  MULTIVARIATE  ANALYSIS 


Let  X  be  a  random  vector  of  dimension  p  over  a  probability  space  (0, 6 , P )  with 
finite  second  order  moments.  Thus  X  belongs  to  the  Hilbert  space  Lj(0, B,P).  In 
this  setting,  the  Principal  Component  Analysis  consists  of  finding  a  transforma¬ 
tion  in  L^(S1,0,P)  taking  the  vector  X  into  Y  such  that  the  components  of  Y  are 
uncorrelated  and 


Vij 


where  X.  is  the  i-th  eigenvalue  of  the  matrix  E  XX 

E  XX'  -  AAA' 

and  thus 


Exixj 


k-i 


Note  that  if  X 


(27) 

AY,  then 

(28) 

(29) 


which  “explains"  the  (co) variance  of  the  components  of  X. 


The  Canonical  Correlation  Analysis  on  two  random  vectors  X^  and  X 

M  \ 

necessarily  of  the  same  dimension,  finds  linear  combinations  of  x'  '  and  x 


( 2 ) 


with  the  largest  covariances  in  a  certain  way. 
connection"  between  the  two  random  vectors. 


It,  therefore,  "explains  the 


THe  important  remark  that  we  want  to  make  here  is  that  both  techniques  rely  on 
the  Hilbert  space  structures  the  inner  product  in  the  space  is  derived  from  the 
covariance.  In  the  statistical  literature,  a  new  measure’  has  been  introduced  by 
Escoufier  [6]  which  is  scalar  and  expresses  the  "similarity"  between  subspaces. 
Let  X  be  the  p  dimensional  random  vector  partitioned  as 


with  covariance  matrix 


,(1)1 


,  (2) 


‘11 


e  LI 


'12 


«-E21 


22-i 


(30) 


(31 ; 


Escoufier  defined  the  following  scalar  measures  [3]: 


cow(x(1),x(2)  ) 

1  m  TrZ  Z 

21  12 

(32) 

VARV(X(1)  ) 

-  TrZ2, 

(33) 

RVfx(1)  X(2)  1 

cowfx(1)  ,x<2>  ) 

(34) 

KV  ^  A  |  A  J 

[VARV(X(1>)  VARV(X12) )]1/2 

7  XW»TX7 
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K 


ft 


!,V 
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The  properties  of  the  above  measures  ^|re  analogous  to  the  usual  covariance 


properties  since  for  any  other  related  x 


1 

i.  cow(x,x(3>) 

1 

2.  If  X(1)  -  AX 

(2) 


n  xn 

where  A£R  ;  AA' 


kl  and  n^  <  n2. 


then 


cow(x(2)  ,x(3> )  -  k  cow(x!1)  ,x(3) ) 


Note  that  it  follows  at  once  from  2  that  the  RV-measure  is  invariant  with  respect 
to  scaling  and  orthogonal  transformations.  However,  the _ .introduced  scalar  mea¬ 


sures  are  not  quite  the  same  as  a  covariance,  since  if  X 
then 

COW(x)(x2)  ■  (Ex^j) 


(D 


and  X 


are  scalar, 


Other  measures  have  been  Introduced  previously,  e.g.,  Hotelling’s  Vector  Correla¬ 
tion  Coefficient  (VCC) ,  defined  as  [12] 


(VCC) ' 


'12 


21 


22 


''ll'  1=22' 


(35) 


.(D  , 


:ral  drai 

t  •  I  7  I 


First  of  all,  it  does  not  support  the  degen¬ 
erate  case  Xv '' ’  *  (xl^) ’ I Z ’  ) 1 .  The  RV-measure  on  the  other  hand  supports  this 
degenerate  case.  Furthermore,  any  intuitive  notion  of  similarity  between  two 
random  vectors  (of  arbitrary,  not  necessarily  equal,  dimension)  should  be  nonzero 
as  long  as  there  is  some  correlation  between  at  least  one  of  the  components  of 
each  vector.  Moreover,  it  may  be  advantageous  to  let  the  difference  in  dimen¬ 
sionality  of  each  vector  influence  the  similarity.  An  increasing  difference  in 
dimensionality  should  decrease  the  similarity.  In  order  to  compare  the  two  de¬ 
fined  measures  in  this  respect,  let  the  covariance  matrix  be  partitioned  as 


where 


‘I  i 


A  0  1  V 


n1  *  "2 


(36) 


eating  the  decreasing  "similarity*  of  the  random  vectors  as  n  increases. 
Vector  Correlation  Coefficient  gives  respectively 


The 


n 


1 . 


(VCC) 


2 

n  6 

i-i 


(39) 


2. 


VCC  is  independent  of  n^  (if  n^ 


n2). 


These  properties  may  provide  the  prime  motivation  to  consider  RV  as  a  measure  of 
similarity,  but  are  not  sufficient  for  a  justification  to  take  it  out  of  the  ad- 
hoc  status. 
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A  ■  diag ( 6  •• *6  ) 

1  n2 

i 

i 

then 

n2 

■ 

■ 

1 . 

Rv(x(1) ,x(2> )  - 

l 

1 

(37) 

■ 

i-1 

;°1n2 

2. 

If  nj  *  1,  then 

RV  ( 

x(1),x(2)  )  - 

(38) 

4 

> 

\ 

the 

right  hand  side 

Of 

(38)  decreases  as  the  dimension  n  increases, 

thus  indi- 

■ 

■  J 


The  theoretical  justification  is  due  to  Escoufier.  The  rigorous  mathematical 
construction  is  as  follows.  First,  characterize  the  random  vector  X  in  by  an 
operator  on  L2<  Next,  show  that  the  set  of  such  operators  has  the  structure  of  a 
Hilbert  space  under  the  inner  product  COW(.,.).  Then  the  psual  induction 

inner  product - *  norm - ♦  distance 

leads  to  a  rigorous  definition  of  the  ■similarity."  In  particular,  we  have  the 
following  definition  of  the  operator  associated  to  X. 

Definition;  Associated  Operator.  With  xeL^  associate  an  operator  Ux  :  L2  ♦  L2 
defined  by 

.  n 

VyeL2  :  U^iy)  -  E  [E^y)  ]x£  (40) 

i*1 


The  following  propositions  follow  then  at  once  from  the  definition.  The  proofs 
are  in  [3]. 

Proposition  1:  Ux  characterizes  X  in  the  sense  of  eigenvalues  and  eigenvectors. 
More  precisely,  if  I  is  the  covariance  matrix  of  X,  then 

1.  VzeRn  s.t.  Ez  -  Az  (41) 


the  random  variable  y  -  x'z  satisfies 
0x(y)  -  Ay 

2.  Conversely,  VyeL2  such  that  Ux(y)  "  Ay 
y  -  x'z  where  Ez  »  Az 


Proposition  2: 


1.  Ux  is  a  Hilbert-Schraidt  operator. 

2.  The  set  of  operators  {ux}  forms  a  Hilbert  space  under  the  inner  product 

<U,,U  ,>  -  E  E^U1  ,U2  (<,>k)  ^  (43) 

CONS 


where  "CONS"  stands  for  a  complete  orthonormal  system  of  basisvectors  { 4^ } 
in  the  space  L2  (which  is  separable) .  Note  that  one  has  also 

<U  ,0  >  -  E  E  p  i!  (*  *  )  (44) 

X1  X2  CONS  ^  CONSj  13  3 

where  the  (u  ,$  )  and  (*.,♦.)  solve  the  eigen  problems  (41)  associated  re¬ 
spectively  with  x1  and  X23.  3 


Proposition  3s 


VV  '  T"£S'1'*1 


‘  1  2 
-EE 
i-1  j-1 


,(1)  Y (2) 
‘1  Xj 


This  propostion  follows  from  the  invariance  of  the  inner  product  (43)  with  re¬ 
spect  to  orthogonal  transformations. 


Define  now  a  "correlation  coefficient"  ^(U^U^)  by 


Y(UlfU2) 


<U1'V 


•u, I nu2 I 


Note  that  the  special  case  Y  ■  0  occurs  iff  the  eigenspaces  of  and  Uj  are 
orthogonal  to  each  other,  while  Y  -  1  iff  U1  and  U2  have  the  same  eigen3pace,  the 
same  eigenvectors,  and  proportional  eigenvalues.  The  correlation  between  the 
operators  associated  with  the  random  vectors  is  thus 


/n^cu 


which  is  defined  as  the  RV-coeff icient  between  X1  and  X2# 


REAL  DATA  CASE  (SIGNAL  PROCESSING  CONTEXT) 


Consider  the  case  where  a  p-dimensional  vector  is  measured  n  times.  It  is  custo¬ 
mary  to  say  that  we  have  p  variables,  and  n  samples.  Let  these  then  be  organized 
in  a  datamatrix 


[x  •••x  )  in  R* 
1  n 


This  can  also  be  represented  geometrically  as  a  configuration  C(X)  of  n  points 
in  Rp  (and  of  course  dually  as  a  configuration  of  p  points  in  Rn,  but  only  the 
first  will  be  discussed).  Let  the  distance  between  points  in  Rp  be  derived  from 
the  metric 


<xjL,x^> 


XiQxj 


with  Q  positive  definite.  Then  Q  ■  LL’,  leading  to  the  equivalent  interpretation 
as  the  Euclidean  metric  on  the  transformed  variables. 


y  -  L’x 


Different  weights  may  be  attached  to  the  different  samples  (e.g.  according  to 
some  measure  of  the  accuracy  of  the  obtained  measurement).  Let  p  ■  {p^,  i-1, 
. ..,  n}  be  such  a  set  of  weights  for  which  p^  >  0; 


Pi  ■  1 


(i.e.  { }  is  a  probability  measure).  A  weighted  average  of  the  data  points  can 
be  defined 


l  Plxi 

1-1 


Whenever  one  has  the  configuration  C(x)  together  with  a  measure  (p^ }  it  will  be 
advantageous  to  consider  the  "centered"  data  matrix  whose  columns  are  x^  -  <x>p 
-  x,.  Its  importance,  of  course,  is  to  obtain  translation  invariant  properties. 


The  classical  multivariable  methods  consist  now  in  a  search  for  linear  transfor¬ 
mations  of  the  original  variables  that  minimize  (under  some  constraints)  the 
"closeness*  of  the  configurations  C ( X)  and  C(Y).  But  how  can  a  distance  between 
configurations  of  points  be  defined?  Ideally,  such  a  measure  should  be  Invariant 
with  respect  to  translation,  rotation  and  scaling,  and  this  should  thus  a  priori 
hold  for  the  "self-distance"  or  distance  of  the  relative  positions  in  C(X).  It 


■*  *-•  \»  •  a  -  r  ~  A  »  •  m  '  •  *  .  •  *  -  »  "w  *  1  »  *  . 


A  ** 


is  easily  seen  that  the  Euclidean  distance  matrix  (i.e.,  with  ij-element  l  ( x  ^  - 
x,) 'Q(xi  -  xi))1/2)  a  (matrix-valued)  measure  which  is  invariant  with  respect 
to  translation  and  rotation,  but  not  with  respect  to  scaling.  On  the  other  hand, 
the  matrix-valued  measure 


s  (X) 

_E _ 


(51) 


>4r  S  (X)2 
„  P 

where  S  (X)  »  diag(/p)X'X  diag(/p)  is  invariant  with  respect  to  translation, 
rotation5,  and  scaling.  Note  that  for  A,B  in  Rnxn,  <A,B>  =»  Tr  A'B  defines  an 
inner  product  on  Rnxn  for  which  the  induced  norm  is  the  Probenius  or  F-norm.  The 
distance  between  the  (measured)  configurations  (C(X),p)  and  (C(Y),p)  is  then 
induced  by  this  norm.,  i.e., 


-  S  (X)  S  ( Y) 

d2((c(X)  ,p)(c(Y)  ,q))  -  11  P  -  — j; - 

/Tr  S  (X)2  /Tr  S  (Y)  2 
P  P 

-  2(1  .-  RV ( X , Y )  ) 


if  one  defines 


RV(X,Y) 


_  Tt  s12  S 2 1 

/TrS  (X) 2TrS  (Y)2  Ac  s2  •  Tr  S2 
P  P  112 


(53) 


Since  S  can  be  interpreted  as  a  sample  covariance,  this  last  definition  shows 
the  neat  similarity  between  the  signal  processing  (53)  and  stochastic  realization 
(47)  contexts. 

UNIFYING  FRAMEWORK 

In  the  previous  sections,  we  have  shown  how  the  (exact)  stochastic  realization 
and  the  (real)  signal  modeling  benefit  from  the  use  of  a  certain  measure  with 
similar  interpretation.  Both  have  been  introduced  by  Escoufler  [7,8].  In  this 
section,  a  more  abstract  representation  is  developed.  The  motivation  starts  from 
the  observation  that  for  the  stochastic  realization  problem,  the  underlying  space 
L^(O,0,P)  and  in  the  real  data  case,  the  space  Rpxn  are  isomorphic  with  the 
tensorproduct  spaces,  respectively 

1^(0, 0,P)  ~  rP  ®  L2(tl,0,p) 

Rpxn  ~  RP  ®  Rn 

In  general  now,  let  G  and  H  be  separable  Hilbert  spaces.  Let  be  a  CONS  in 

G,  and  {4^}  a  CONS  in  H,  then  any  vector  x  in  the  tensorpaoduct  space  G@H  has  a 
decomposition 

x  3  \xi  ©  ^  (56) 

with  x^eH.  In  this  framework,  we  define  the  Associated  Operator  and  Gramiar. : 

Def  inltloni  With  the  decomposition  of  xeG  ®  H  as  in  (56),  the  associated  oper¬ 
ator  Ux  from  H  to  H  is  defined  as 

U  :  H  ♦  H  :  y  ♦  Ky,x.>  x.  (57) 

X  ^  1  H  1 

Although  this  definition  is  given  with  reference  to  a  specific  coordinate  system, 
the  following  theorem  is  easily  shown. 


(54) 

(55) 
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Theorem:  The  associated  operator  Ux  Is 


1.  Independent  of  the  choice  of  a  CONS  In  G. 

2.  Bounded,  self-adjoint  and  positive  semidef  inite . 

3.  Hilbert  Schmidt  (l.e.,  Z  I  u  (^  )  ij  <  •») 

^  x  t  H 

Definition:  Given  x  in  G  ®  H,  the  Gramian  Gx  is  the  operator  Gx  from  G  to  G, 

g  — ♦  Gx(g)  such  that 

Gx(V  -  Z  <xl(y$t  (5  ^ 

Note  that  the  above  gives  a  definition  of  Gx  via  Its  action  on  a  CONS.  Again, 
the  following  is  easy  to  show: 


Theorem:  The  above  defined  Gramian  Gx 

1.  Is  coordinate  independent. 

2.  Is  bounded,  self-adjoint,  positive  seraidef Inite  and  Hilbert-Schmidt . 

3.  Has  normalized  eigenvectors  which  form  a  CONS  in  G. 

Main  Property:  The  operators  Ux  and  Gx  have  the  same  eigenvalues  (counting  their 
multiplicity).  The  eigenspace  of  Gx  corresponds  to  the  eigenspace  of  ux  under 
the  mapping 

X  :  G  ♦  H  :  Z  ♦  X(Z) 

(  59) 

X(Z)  -  Z  Z  x 
1  3  3 

Definition:  The  "correlation*  between  x1  in  G1  ®  H  and  x2  in  G2  (x)  H  is  (with  G1 

and  G2  3ubspaces  of  G) 

(x  ,x  )  -  y(U  ,U  )  (58) 

X1  *2 

where  Y( is  the  correlation  in  the  Hilbert  space  L(H)  of  the  associated 
operators. 

Theorem:  The  correlation  defined  above  is 

1.  Independent  of  the  chosen  CONS. 

2.  Invariant  w.r.t.  orthogonal  transformations,  scale  transformations,  and 
translations, 

3.  <U  ,U  >  -Tr(G  G  )  (61) 

1  2  L (H)  21  1* 

It  follows  now  from  this  last  theorem  that 


(x1  ,x2) 


Tt  °21G12 
At  G?.  Tr  G?, 


11  12 

Remark :  A  complete  duality  exists  between  the  roles  played  by  G  and  H. 
GEOMETRICAL  INTERPRETATION:  CORRELATION  BETWEEN  SUES PACES 


Let  G  be  the  Hilbert  space  containing  the  "observations*  { y ^  } .  Denoting  by  M  „N 
the  largest  subspace  contained  in  both  M  and  N,  and  by  MvN  the  smallest  subspace 
containing  both  M  and  N,  the  set  of  subspaces  which  are  closed  under  A  and  v  have 
a  lattice  structure.  If  M1  denotes  the  usual  orthogonal  complement  in  G,  then 
the  subspaces  of  G  form  a  complete  orthocorapleted  lattice  (or  logic  (22)).  It  is 
postulated  that  the  propositions  of  a  physical  system  are  a  complete  ortho 
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complemented  lattice  [15).  Since  the  propositional  calculus  of  a  physical  system 
has  a  similarity  to  the  corresponding  calculus  of  formal  logic,  one  refers  to 
this  often  as  "quantum  logic." 

The  set  of  subspaces  of  a  Hilbert  space  is  isomorphic  to  the  set  of  orthopro¬ 
jectors  Proj  G  on  G.  with  each  subspace,  there  corresponds  one  projector  (namely 
the  one  whose  range  is  exactly  that  subspace),  and  vice  versa.  Consider  now  the 
vector  valued  measure  on  the  subspaces  (projectors) 

«y(A)  -  PAy  (63) 

where  P*  denotes  the  orthoprojector  on  subspace  A.  This  5  has  the  interesting 
properties  that  if  A  1  B,  then  5(A)  1  5(B),  which  is  the  orthogonal  scattering 
property.  Furthermore,  if  {p*}  is  a  set  of  pairwise  orthogonal  subspaces  (pro¬ 
jector-),  then 

1  e  ip  >  -  5  (*  p  )  (64) 

i  y  y  i 

The  last  property  is  characteristic  for  a  Gleason  measure  (13J,  so  that  5  is 
referred  to  as  an  Orthogonally  Scattered  Gleason  (OSG)  measure  [14].  This  OSG- 
measure  induces  a  scalar  Gleason  measure 

U  (A)  -  I  6  (A)  ij  (65) 

y  y  o 

Interpreting  the  right  hand  aide  of  (65)  as  a  variance,  it  is  natural  to  intro¬ 
duce  a  "covariance"  between  subspaces  as 

<  5 ( A) ,5(B)>q  (66) 

By  Gleason' 3  theorem  [11],  there  exists  a  self-adjoint,  trace  class  operator  T 
such  that 

u  (A)  -  Tr[TyPA)  (67) 


Note  that  here  simply  T  ■  yy'. 


Consider  now  a  family  of  vectors 
ducing  a  measure  p  on  the  y^  a 
tion)  is  obtained 


y,  and  their  corresponding  measures  5..  Intro- 
"mixture"  gleason  measure  (or  linear  superposi- 


5 


*i5i 


(68) 


Aragon  and  Couot  [2]  show  that  with  any  such  superposition, 
gain  an  operator 

T  - 

i  *  4  i 


l  P,T.  -  I  p.y.y! 


there  corresponds 
(69) 


The  correlation  between  the  subspaces  A  and  B  is  then  [14] 


<5(A)  , 5(B)  >  ,  Tt  (T?SB) _  (70) 

^u(A)  U(B)  /  ,_A._  ,  b. 

*Tt (TP  )Tr (TP  ) 

Note  that  T  can  be  interpreted  as  sample  covariance  S.  Hence  finding  the  sub¬ 
space  B  of  fixed  dimension  which  best  "approximates”  the  given  space  entais  the 
maximization  of  (note  that  since  A-G,  then  PA  -  I) 


Tr (S?B) 


A 


Tr  <S)Tr (SPB) 


Tr  <y 

^Tr(S)Tr  (SB) 


•V 


Tr  (SQ) 
Tr  (S) 


(71) 


B  B  n 

for  Sg  •  P  SP  .  Clearly  the  optimal  projection  P°  is  the  one  projecting  on  the 

eigenspace  of  S  with  the  largest  principal  components.  If  instead  of  a  discrete 
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measure  P,  one  has  a  continuous  measure  p,  then  (70)  remains  valid  if  one 
replaces  the  (sample)  covariance  S  by  the  exact  covariance  matrix  E  for  G 
-  LP(fl,B,P)  . 

The  principal  component  analysis  for  the  exact  covariance  and  real  data  cases 
follow  thus  nicely  from  this  formalism.  What  is  the  connection  between  this 
formalism  and  the  operators  developed  in  the  general  RV-theory?  Rewriting  (69) 
as 

T  “  I  m  l  (72) 

and  defining  the  data  matrix  Y  as  diag  (/p^)  Y*  the  operator  can  be  written  as 

n 

t ( •>  -  I  <y , * •>  „y,  -  u.(*)  (73) 

i-i  rp  r 

where  (57)  is  used  (for  the  dual  case)  for  G  -  Rn'  and  H  »  Rp.  But  then  for  A, 
B  e  Proj  Rp  we  get 


<tpa,  tpb> 


L(RP) 


Tr  TPATPB 


cow  (A,B) 


thus  leading  to  the  earlier  defined  RV  coefficient.  Note  that  (74)  defines  a 
■covariance"  between  operators  whose  existence  follows  from  Gleason's  theorem, 
while  in  (66)  we  considered  directly  the  "covariance"  between  the  OSG-measures 
itself.  T  plays  the  role  of  the  exact  or  sample  covariance  matrix,  which  is  in 
fact  the  representation  of  the  Gramian  defined  earlier. 

APPLICATIONS 

The  results  in  this  section  have  been  reported  in  [18]  and  [19],.  They  are  in¬ 
cluded  for  completeness.  Consider  the: 

General  Problem:  Given  xeRP*n  and  yER^*0.  Find  transformations  L  and  M  (i.e., 
metrics),  such  that  L'X  and  M'Y  are  as  "similar"  as  possible.  As  a  measure  for 
similarity,  the  use  of  the  RV-coef f icient  was  motivated, 


RV(L'X,M’Y)  - 


Tr (L'S12mm'S21L) 

-  2 
/Tr(L'SnL)  Tr(M'S22M) 


Remarks: 


1.  Since  for  any  orthogonal  matrix  R,  the  substitution  of  L  by  LR  leaves  RV 
Invariant,  one  may  assume  without  loss  of  generality  that  the  matrices 
L'S^L  and  M'S22M  are  diagonal. 

2.  Instead  of  finding  optimal  transformations,  equivalent  problems  are  to  find 
a  proper  metric  or  a  set  of  projection  hyperplanes. 

The  familiar  statistical  modeling  techniques  can  now  be  brought  on  a  common 
ground  using  this  powerful  tool  [8]  and  applied  to  the  stochastic  realization 
problem. 
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Generalized  Double-Sided  Stochastic  Realization 


Here  we  let  X  -  Y  and  Y  »  Y  ,  then 


s  H  [~  +  “  ~~ 

11  12  R  H 

m  m 

21  S22j  |_H*  R” 


With  the  states  foe  the  forward  and  the  backward  realization  as  in  (23)  and  (24) 
and  diagonal  Ax  and  Ay  the  generalized  doable-sided  stochastic  realization  prob¬ 
lem  (18,19]  can  be  formulated  as  the  maximization  of 

RV(L'Y+,M '*') 

with  respect  to  L  and  M  and  subject  to 

L’R+L  -  A„  (77) 


M'R  M  ■  Ax  (78) 

Using  (diagonal)  Lagrange  multipliers  A  and  V,  the  problem  is  reduced  to  an  un¬ 
constrained  optimization  problem.  Its  conditions  for  optimality  are 

HMM'HL  -  R+LA  -  0  (79) 

H'LL'HM  -  R~M¥  »  0  (80) 


It  follows  that 


A  A  -  A  t 
Z  x 


which  together  with  (79)  and  (80)  leads  to  the  generalized  eigenvalue  -  eigenvec¬ 
tor  problem 

H(R-)“1H'L  -  R+LT  (82) 

H' (R+)_1HM  -  r”mT  (83) 

with  r  the  eigenvalue  matrix,  r  is  related  to  A  and  V  by  [8] 


» -  v 

The  optimal  transformations  L  and  M  are  the  solution  to 

L .  (rVW-’/V’/V/2 

m  -  (RV,i,r"1/V1/2A*/2 

Z  X 

For  the  particular  choices  of  A  and  A  made,  the  maximal  RV  coefficients 

x  z 


RV(L'y  ,M'y  ) 


Tr  A  TA 
x  z 

/Tr  A2  Tr  A2 


Note  that  if  one  of  A  and  A  is  the  Identity,  then  choosing  the  other  as  T,  the 
diagonal  of  the  squared  canonical  correlations,  maximizes  RV.  Ramos  [18]  has 
shown  that  the  input  normal,  the  balanced  (5],  and  the  output  normal  [1]  stochas¬ 
tic  realization  follow  from  this  choice.  Baram's  realization  [4]  follows  by 
taking  both  equal  to  I. 


2.  The  General  Double-Sided  Stochastic  Realization 


This  corresponds  with  the  principal  component  analysis  [3),  constraining  L  to  be 
the  identity,  thus  maximizing 

RV(Y  +  ,m'y") 

subject  to  (78). 

The  solution  is  . - 

M  -  (R-)  H'Q  (89) 

for  some  orthogonal  matrix  Q.  The  maximal  RV  is 


RV(y+,  m;  y") 


-  -K 

r  H (R  )  H1 

Tr (R*)2 


(90) 


CONCLUSION 

The  RV-coef f icient  framework  enables  the  unification  of  the  theory  of  stochastic 
realization,  and  in  fact  provides  a  valuable  tool  to  directly  compare  different 
modeling  or  model  reduction  schemes.  The  double  sided  solution  involves  canoni¬ 
cal  correlations,  as  opposed  to  variances  on  the  one-sided  solution.  The  form¬ 
alism  applies  to  the  exact  covariance  as  well  as  the  real  data  case. 

The  realizations  of  the  RV-coef f Icient  to  projection  measures  have  been  explor¬ 
ed.  This  is  currenlty  being  further  investigated,  and  it  is  hoped  that  it  will 
yield  more  insight  into  the  dynamical  aspects  of  the  real  data  case. 
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Aha  tract.  It  ha*  been  well  established  that  th# 
ao-callad  balancad  atat*  apac*  raalixation*  of  linear 
time-invariant  ayataaa  hava  cattain  daaicabla  computa¬ 
tional  properties.  But  th*  cooplat*  ralationahip 
batwaan  balancad  caalixationa  and  th*  general  parameter 
aanaitivity  problem  it  not  ytt  fully  appraciatad. 

In  thit  paptr  v*  prtatnt  a  geometric  approach  to 
th*  cobuat  datign  problan  and  ita  aolution  uaing  a 
(pacific  optimality  criterion.  In  particular,  for 
diacrata-tim*  linaar  ayatama,  a  minimum  aanaitivity 
parameter ixat ion  of  a  linaar  ayatam  with  a  given  input- 
output  ralationahip  will  b*  linked  to  the  balancad 
raalixation. 


notation 


(Th*  following  ia  a  aummary  of  th*  notation  uaad  in 
thia  paper.  All  nonatandard  terminology  will  be 
defined  in  th*  body  of  th*  paper.) 
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aet  of  real  number* 
aet  of  natural  number* 

aet  of  parameter* 

aet  of  ixj  real  matricaa 

ith  eigenvalue  of  matrix  T 
ayatem  order 

numbar  of  ayatem  input*,  output* 
minimum  of  (n*l)m  and  (n*1)p 
equivalent  to  (n+1)nm 
equivalent  to  (nel)np 
equivalent  to  aet 
q-dimanaional  parameter  apac* 
point  in  9,  l.e.,  a  paramater  tat 
extremal  aanaitivity  point  in  9 
obaarvabl*  functional 
gradient  vector  of  functional  f 

Baaaian  matrix  of  functional  f 

third  order  tanaor  of  functional  f  with 
reapact  to  9 
obaarvabl*  value 

manifold  in  0  induced  by  functional  f  and 
obaarvabl*  value  k 
equivalence  relation  on  9 
aanaitivity  performance  index 
Lagrange  multiplier  conatant 
aanaitivity  Hamiltonian  with  raapact  to  k 

atat*  apac*  raalixation  triple 
reachability,  obaarvability  metric** 

ith  column  of  matrix  0 
1.3th  element  of  matrix  0 

ayatam  Hankal  matrix 
n  x  n  identity  matrix 

n  x  a  identity  matrix 
jth  aingulat  value 
Xronecker  delta  function 

Xronacket  product 
column  atacking  operator 
tract  operator 
frobenius  norm 


1.  Th*  Problem  Definition  and  Hiatorical  Content 


Thia  paper  dealt  with  a  new  geometric  approach  to 
th*  robuat  deaign  problem,  elastically,  th*  sensitiv¬ 
ity  properties  of  a  given  raalixation  have  been  inves¬ 
tigated  either  via  a  'aanaitivity  ayatem,*  which  is 
usually  prohibitively  large  (12),  or  alternatively,  via 
the  operator  form  (II).  Muller  and  Weber  determine  the 
control  and  observation  aanaitivity  in  [ l 0 )  and  maxi¬ 
mix*  scalar  maaauret  for  th*  'quality*  with  respect  to 
certain  structural  parameter*.  Th*  question  of  robust¬ 
ness,  with  respect  to  variations  of  certain  structural 
parameters,  ia  cloaely  related  to  thia  problem  and 
treated  by  Ackermann  in  (I).  finally,  sensitivity 
analysis  from  a  geometric  point  of  view  was  recently 
introduced  by  Oslchamps  (9)  and  applied  to  compensation 
and  feedback.  Our  approach  will  b*  geometric  in  nature 
aa  wall,  but  strictly  from  a  raalixation  perspective. 

Consider  a  linear  time-invariant  system  with  m 
inputs  and  p  outputs.  for  our  applications,  this  may 
b*  a  modal  for  a  real  system  on*  wanta  to  simulate,  or 
th*  implementation  of  a  digital  or  analog  filter,  or  an 
observer-controller  for  implementing  an  optimal  regu¬ 
lator  in  soma  given  plant.  In  all  these  applications, 
only  th*  relationship  batwaan  th*  input  and  output  of 
th*  implemented  device  is  Important.  It  ia  now  well 
known  that  many  equivalent  atat*  apace  realisation* 
exist  for  on*  and  th*  same  input-output  behavior. 
Usually  th*  so-called  'canonical  forms*  are  implemented 
because  they  minimix*  the  numbar  of  parameter*  that  are 
required  and  allow  for  a  pipelined  realisation  of  the 
davicaa  (e.g.,  th*  'diract  forma*  in  digital  signal 
processing).  A  minimal  number  of  parameter*  corre¬ 
sponds  to  minimal  complexity,  a  quality  that  may  be 
Important  If  th*  operation  count  is  significant. 
However,  a  minimal  sat  of  parameters  has  no  redundancy, 
and  therefore,  on*  may  expect  high  sensitivity  with 
raspact  to  these  parameter*. 

This  paper  investigates  how  the  nonuniqueness  of 
th*  state  space  realisations  can  b*  utilised  to  deter¬ 
mine  optimal  paramatarixations  under  various  measures 
of  'optimality.*  In  particular,  w*  shall  look  at 
raalixation*  which  exhibit  a  minimal  sensitivity  with 
raapact  to  variations  in  parameters  derived  from  the 
reachability  and  observability  matrices. 

In  ganaral,  an  n-dim*n*ional  linear  system  with  m 
inputs  and  p  outputs  has  *  full  parameter  nation  con¬ 
sisting  of  n(n*m*p)  parameter*  (5).  furthermore,  it 
has  bean  demonstrated  that  such  a  ayatam  can  b*  mini¬ 
mally  parameter isad  by  no  more  than  n(m*p)  parameters 
(2 ) .  Wa  shall  find  it  mathematically  convenient, 
howavar,  to  work  with  a  larger,  albeit  redundant, 
parameter  set.  Specifically,  w*  will  consider  a 

q  •  (r.»1)n(m*p) -dimensional  affin*  apac*  such  that  each 
point  in  th*  apac*  represents  *  particular  reachability 
and  observability  matrix  for  a  particular  linear 
ayatam.  Th*  parameter  space  i*  not  required  to  have  a 
vector  spec*  structure.  This  is  because  additions  and 
scalar  multiples  of  such  matrices  have  no  meaningful 
interpretation.  The  space  is  given  th*  structure  of  a 
Rlamannian  manifold  by  introducing  an  Euclidean  metric 
in  th*  tangent  space  at  each  point. 
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Clearly,  in  chi*  apac*  certain  connected  aubaete 
will  exiat  auch  that  all  point*  on  *uch  a  subset  corre¬ 
spond  to  a  system  with  the  same  input-output  behavior. 
In  fact,  we  can  resolve  the  whole  space  into  disjoint 
set*  corresponding  to  different  input-output  behavior. 
For  a  particular  parameterization,  the  proximity  of 
neighboring  sets  or  ‘leaves*  will  be  an  indication  of 
ths  robustness  and  sensitivity  of  the  parameterization 
with  respect  to  perturbations  of  th*  Individual 
parameters.  Hence,  optimal  parameterization*  ar*  those 
for  which  the  ‘inter-leaf*  distances  ar*  maximal. 

t 

1.  Geometric  Approach  to  the 
Robust  Design  Problem 

Suppose  w*  have  an  affine  space  9  and  some 
smooth  functional  fi9->R.  Than  ws  can  define  Hk(f)  • 
(9t9if(9)-k)  as  th*  ‘level  surfaces*  induced  by  f  and 

th*  scalar  k  on  ths  space  9.  Essentially,  f  generates 

equivalence  classes  on  9  with  9*~9^  if  and  only  if 

f(9*)*f(9^)  for  9*,9^c9.  This  notion  motivates  th* 
following  definition. 

Definition  1i  Th*  functional  f  will  be  called  an 
observable  functional  over  th*  parameter  apac*  9,  Th* 
scalar  f(9)  at  a  particular  9c9  will  be  referred  to  as 
ths  observable  value  at  9. 

Using  th*  construction  above,  we  see  that  param¬ 
eter  sets  which  give  th*  same  observable  value,  k,  ar* 
essentially  indistinguishable  with  respect  to  th*  given 
observable  f.  In  th*  context  of  th*  study  of  parameter 
sensitivity,  an  important  question  herein  is  to  deter¬ 
mine  which  parameter  sets,  if  any,  in  a  given  equiva¬ 
lence  class,  Hk  have  th*  least  or  greatest  propensity 

to  change  th*  observable  value  whan  the  individual 
parameters  in  the  seta  ar*  perturbed.  This  problem  can 
obviously  b*  addressed  by  examining  th*  gradient  of  f 
Over  th*  manifold  Mk. 

Suppose  w*  ar*  given  a  parameter  set  I  t  k  for 

o  k 

some  fixed  k.  It  is  clear  that  w*  can  perturb  9  in  an 

o 

infinite  number  of  directions  and.  in  general,  ths 
observable  value  will  be  perturbed  as  well.  The 
direction  with  th*  greatest  influence  on  k,  !.*.,  th* 
greatest  directional  derivative  in  magnitude,  is  in  th* 

direction  of  th*  vector  f. (9  ),  where 
0  o 

£„(9  )  *  gradient  of  f  evaluated  at  9 

0  O  O 

Alternatively,  w*  can  view  th*  spec*  9  as  being 
composed  of  level  surfaces  or  ‘leaves,*  where  a  given 
leaf  corresponds  to  a  specific  equivalence  class  Hk. 

Th*  relative  spacing  between  such  leaves  is  measured  by 
moving  normal  to  th*  surface  of  on*  leaf  until  another 
is  reached.  This  normal  direction  at  a  point  on  * 
surface  is  given  by  th*  gradient  at  that  point,  and  th* 
normal  distance  in  this  case  can  be  best  interpreted  ss 
th*  magnitude  of  th*  gradient.  Thus,  w*  Immediately 
get  th*  definition  below. 

Definition  2i  A  parameter  set  9*  t  «k  is  an  extremal 
sensitivity  point  in  Mk  if  and  only  it  9*  minimises  or 
maximizes  1.(9)  over  Hk,  where 


L(9)  •  -j  I  f9  (9)  I  2 


(1) 


Note,  th*  extra  square  function  and  factor  of  y  does 

not  change  our  conclusion  above  and  were  added  only  as 
an  algebraic  convenience. 


Using  methods  from  th*  calculus  of  variations,  we 
can  further  characterize  and  determine  the  extremal 
sensitivity  points  of  Mk.  Specifically,  let 


ve> 


L(9)  -  A(  f  (9  } -k) 


where  X  is  a  Lagrange  multiplier  constant, 
have  th*  following  definition. 


(2) 

Then  w* 


Definition  3i  A  point  9  on 
extremal  sensitivity  point  if 


is  said  to  be 


dH  (9)/d9 
k 


(fe9(9)-Xl)f9(9, 


(3) 


In  other  words,  th*  extremal  aansitivity  points  have 
gradient  vectors  which  ar*  eigenvectors  of  their 
Hessian  matrix  f09>  He  can  ascertain  whether  a 

particular  extremal  aansitivity  point  has  maximum 
sensitivity  or  minimum  sensitivity  by  examining  the 
definiteness  of  the  Hessian  matrix. 

He  now  apply  these  general  results  in  th*  context 
of  discrete-time  linear  aystem  theory  and  show  how  they 
can  be  used  to  solve  th*  robust  design  problem. 

3.  An  Application  for  Robust  Design 
of  Discrete-Time  Systems 

The  parameter  sensitivity  of  stst*  space  realiza¬ 
tions  of  linear  time-invariant  systems  has  been  studied 
by  msny  investigators  during  ths  past  decade  [7,9-12). 
Much  of  ths  research  has  been  motivated  by  the  desire 
of  system  designers  to  take  into  account  the  effects 
of  uncertainty  inherent  in  all  practical  systems. 
Frequent  sources  of  uncertainty  include  imprecise 
knowledge  of  th*  ‘given*  plant  parameters,  external 
disturbances  to  ths  plant,  ar.d  roundoff  and  quantiza¬ 
tion  error  in  th*  finite  precision  components  of  the 
control  system. 

It  hss  been  well  established  that  balanced  (in  th* 
tense  of  Moor*  [9,11)1  state  space  realizations  of 
linear  time-invariant  systema  have  certain  desirable 
computational  properties  (7,9).  But  th*  complete 
relationship  between  balanced  realizations  and  th* 
general  parameter  sensitivity  problem  is  not  yet  fully 
understood.  What  follows  is  an  attempt  to  us*  the 
results  of  th*  previous  section  to  demonstrate  further 
linkage  between  th*  two  concepts.' 


Consider  a  system  described  by  ths  state  space 

modal  (A,b,C)  where  At Rnx* ,  Bcr"*",  CrRpXn.  At  this 
point,  w*  need  not  assume  that  th*  triple  describes 
specifically  a  continuous  or  discrete-time  system. 
It  is  well  known  that  such  a  linear  system  can  also 
be  uniquely  specified  by  an  (n+1)p  x  (n+1)m  matrix  of 
Markov  parameter*  known  as  th*  Henkel  matrix,  say  H. 
Furthermore,  ths  Rankel  matrix  can  be  shown  to  always 
permit  th*  factorization 

■  •  OR  (4) 
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Now  assume  that  ws  wish  to  rssliz*  s  linear  system 
specified  uniquely  by  a  given  Hankel  matrix,  B.  Th* 
parameter ization  w*  shall  consider  is  a  parameter  set 
consisting  of  th*  (n+1)n(m+p)  components  in  the 
matrices  0  and  R. 

Definition  4i  Let  T  bs  a  class  of  (n+1)m  x  (n+1)p 
matrices  with  th*  defining  property  that  Act  if  and 
only  if  A  has  ths  singular  value  decomposition 
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when  a  >  p 
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Clearly,  when  m*p,  then  T  it  amply  the  cleat  of 
(n+l)a  x  (n+i)m  orthogonal  aatricea. 


|  Proof i 


Proof i  Let  E  have  the  aingular  value  decompoaition 
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Decompoaing  A,  then  aa  in  (S)  we  have  that 


r  r  _ 

Tr(AB)  -  Tr(  l  v.uT  •  l  <J,U  V  ) 
im]  11  j«1  J  J  J 


■  I  °J  l  tt(v  u^u  vp 
i«l  J  j«l  1  ‘  J  J 


r 

l  «j 
i-t  1 


tion  oi  >  0  for  all  icN  ao  !  ■  0. 


Lemma  1  providee  ua  with  an  obaervable  functional 
that  can  be  uaed  aucceaafully  In  the  context  of  the 
previoua  taction.  Specifically,  let 


f  (9)  -  Tr  A  (B-OB)  ,  Art 


(8) 
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Theorem  li  The  extremal  aenaitivlty  pointa  of  ( f J 
have  the  property  that 


BRT  •  0*A*A0  when  p  > 


RAA*BT  •  0T0  when  a  »  p  . 


Proof i  There  are  aeveral  waya  to  prove  thia  reault. 
The  following  method  takea  advantage  of  aoaa  matrix 
calculua  •ehorthend*  notation  to  do  the  matrix  manipu¬ 
lation  (1).  Proa  equation  (I),  we  have 


f  (8)  •  Tr  A  (H-OR)  ,  Act 
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Leana  l :  Let  E  be  an  (n*1)p  x  (n*l)a  matrix  auch 

that  T)r(AE)  ■  0  for  all  Act.  Then  it  followa  that 
E  -  0. 


The  optiaality  condition  ia  then 
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Making  the  appropriate  aubat itutions,  it  follows  that 
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where  •  ■  (n*1)n»  and  t  ■  (n*l)np,  Equation  (12)  gives 


when  p  >  m  ,  (13a) 


*  (I  mA )  eec(O)  •  eec(B  ) 
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T  T 

vec(O)  ■  a  (I  mA)  vec(B  )  when  m  >  p  .  (13b) 
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We  have  uaed  in  (I3a,b)  the  requirement  that  1  -  el  for 
the  ayatea  given  by  (12)  to  be  conaiatent. 


How  expand  (13a)i 
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But  by  the  definition  of  the  aingular  value  decoapoei- 


iAO  -  B.  ;  i  -  1,2, 
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where  vao(*)  ia  the  coluan  atacking  operator  froa 
matrix  calculua  (3).  Clearly  then  we  ahall  be  inter- 
eated  in  the  equivalence  claae  WQ (f )  where  by  Leaaa  1 
it  followa  that  B  •  OR. 


kAO 


T  T  T 
RR  •  0  A  AO 


Likewiae,  froa  (13b)  we  get 


T  T  T 
BAA  R  •  0  0  . 


We  now  reatrict  our  attention  to  the  caae  where 
a  •  p  auch  that  A  and,  conaequently,  are  aiaply 
orthogonal  aatricea. 


Definition  Si  Realization  (A,B,C)  with  a  *  p  la  aaid 


T  T 

to  be  an  eaaentielly  balanced  realization  if  RB  *  O  O. 


The  definition  above  la  aotivated  by  the  fact  that 
T  T 

in  the  diacrete-tiae  caae  BB  and  0  O  are  reapectively, 
the  controllability  and  obaervabllity  grammiena  at  time 
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t  *  n.  Thus,  an  essentially  balanced  realization  can 
be  converted  to  a  true  finite-interval  diacrete-tine 
balanced  realization  (aee  (141)  by  amply  an  orthogonal 
state  apace  transformation  T  such  that 

T(RRT)TT  -  T(0T0)TT  -  diag (U,,U, . u  )  ■  I  . 

In  this  case,  T  can  be  interpreted  as  a  series  of 
rotations  of  the  state  space's  coordinate  axes.  From 
this  observation  it  follows  then  that  only  the  scaling 
of  the  state  variables  has  an  effect  on  the  sensitivity 
of  the  parameters  [R^  ^1  and  (O^  ^ ] .  Another  important 

property  of  essentially  balanced  realizations  is 
illustrated  in  the  following  lemma.  * 

Lemma  2i  An  essentially  balanced  realization  has  the 
minimum  sensitivity  property. 

Proof i  As  alluded  to  in  the  previouc  section,  we  can 
test  for  the  sense  of  the  optimality  by  examining  the 
definiteness  of  the  Hessian  matrix 


<Wf9  +  (ffl«-‘V'89 


For  our  particular  choice  of  observable,  f,  we  have 
tBS9  “  0  *inc*  i*  i*  bilinear  in  Rj  ^  and  j*  Since 
f00  is  an  orthogonal  symmetric  matrix,  it  follows  that 


i.e.,  the  matrix  f  is  similar  to  a  signature 
9  9 

matrix.  Thus  there  exists  a  similarity  transform  T 
such  that 

Tfe0T  -  diag(*1,*l,±i . el) 

T1fe9*XIq>'r’1  *  diag(*1-A,*1-X,*1-X,...,±l-X)  . 
Hence, 

Tlfgg-XI^f^T'1  -  diag(1*X.1±A,1±X . ,*X)  . 

For  X  ■  1  and  X  ■  -1,  we  have 


‘  dU9(  o  '  o 


The  fact  that  the  Hessian  matrix  is  semi-definite 
leaves  open  the  possibility  that  there  may  be  some 
coordinate  directions  one  can  move  in  the  parameter 
spece  8  that  have  no  Influence  on  the  observable  value. 
Hence,  the  observable  would  be  completely  insensitive 
to  these  parameters. 

There  are  several  interesting  observations  to  be 
made  with  respect  to  the  minimum  sensitivity  criterion. 
First,  note  that  we  worked  specifically  with  a  Hankel 

matrix  HcR*n*'*^  *  (n*1)m^  j,  well  known  that  the 
Hankel  matrix  must  be  at  least  this  large  in  order  to 
specify  the  system  uniquely.  However,  there  is  no 
reason  why  the  above  derivations  could  not  have  been 
carried  out  using  a  larger  system  Hankel  matrix,  i.e., 
using  Harkov  parameters  of  higher  degree.  In  fact,  if 
it  is  assumed  that  we  are  working  with  a  stable  system, 
then  we  could  have  used  the  doubly  infinite  Hankel 
matrix  in  the  derivations.  In  this  later  case,  the 
criterion  given  by  Theorem  1  is  satisfied  by  requiring 
that  the  system  realization  be  an  infinite-interval 
discrete-time  balanced  realization.  Second,  note  that 
the  minimum  sensitivity  criterion  did  not  use  implic¬ 
itly  the  feet  that  the  Hankel  matrix  given  corresponded 
to  thet  of  e  discrete-time  system.  This  assumption 


simply  allowed  us  to  put  the  results  immediately  in  a 
balanced  realization  framework.  The  continuous-time 
problem  is  slightly  more  difficult  to  solve  and  will  be 
presented  in  a  later  publication  1 15]. 

Finally,  it  la  instructive  to  display  the  perform¬ 
ance  index  explicitly  for  the  observable  functional 
given  in  equation  (8).  Substituting  equation  (10b) 
into  (1),  it  follows  that 

’~(0i  I'  .  (16, 

*ec(R )l 

Hencs,  the  effect  of  minimizing  L(6)  while  constraining 
H  •  OR  is  to  make  the  components  of  O  and  H  roughly  the 
same  order  of  magnitude.  Such  a  situation  would  be 
dasirabla  if  these  perameters  were  to  be  quantized  for 
fixed  point  data  registers. 


4.  Computational  (Algorithmic)  Aap 


A  few  consents  are  in  order  concerning  how  an 
essentially  balanced  realization  might  be  computed 
given  a  specific  Hankel  matrix.  Clearly,  there  are 
matrix  algebraic  methods  poaaiblo  similar  in  flavor  to 
those  used  to  compute  balanced  realizations,  we  shall 
concentrate  in  this  csss,  however,  on  a  numerical 
optimization  approach  that  follows  very  naturally  from 
the  theoretical  structure  put  forth  up  to  this  point. 


From  our  discussion,  it  follows  that  an  essen¬ 
tially  balanced  realization  will  minimize  the  perform¬ 
ance  index  given  in  equation  (16),  subject  to  the 
constraint  that  B  •  OR.  Lemma  1  gave  us  a  convenient 
way  to  adjoin  this  constraint  while  keeping  the 
Hamiltonian  simple  and  differentiable  in  the  components 
of  0  and  R.  The  price  paid,  however,  was  the  introduc¬ 
tion  of  an  arbitrary  orthogonal  matrix  A.  Clearly, 
such  a  formulation  does  not  lend  itself  easily  to  the 
use  of  numerical  algorithms.  To  remedy  this  problem, 
form  the  Hamiltonian  Instead  as 


H  (8)  -  H  (0,R)  «  — 
O  O  46 


where  1*1  is  the  Forbenlus  norm.  Now  the  optimization 
can  be  carried  out  explicitly  on  hq  with  respect  to  O 

and  R  and  the  scalar  X.  It  should  be  noted,  however, 
that  from  an  analytic  viewpoint  equation  (16)  is  often 
more  difficult  to  work  with.  Finding  an  essentially 
balanced  realization  is  now  reduced  to  solving  an 
optimization  problem  where  HQ  is  to  be  minimized  with 

respect  to  the  components  of  O  and  R.  One  approach  to 
solving  such  a  problem  is  to  amploy  a  relaxation-type 
algorithm.  Specifically,  one  can  perform  a  sequence  of 
one-dimensional  gradient  based  optimizations  until  an 
essentially  balanced  pair  (O.R)  is  oomputed  to  desired 
precision.  The  8  and  C  matrices  of  the  realization  can 
than  be  immediately  identified  from  the  first  block 
column  of  R  and  the  first  block  row  of  O,  respectively. 
The  A  matrix  follows,  for  example,  from 

A  •  R  R*  (18) 

where 


R  is  the  pseudo-inverse  of  R  and  R  is  found  by 
shifting  the  right-most  block  column  of  R  into  R  (7). 


An  obvious  limitation  to  this  method  is  that 

o 

must  be  optimized  with  respect  to  a  large  number  of 
parameters.  In  fact,  for  long  or  infinite  time 
intervals,  such  an  approach  will  not  be  practical. 


5.  Conclusions  md  Future  Research 

Given  a  square  ays  tan  Henkel  aatrix,  it  was  shown 
that  a  realisation,  which  satisfies  the  discrete-time 
balanced  realization  criterion  to  within  an  orthogonal 
state  apace  transforaation,  has  ainiaua  sensitivity 
with  respect  to  perturbations  in  the  components  of  the 
reachability  and  obsarvability  matrices.  This  was 
observed  to  be  true  regardless  of  the  size  of  the 
Bankel  matrix  used  provided  it  was  large  enough  to 
specify  the  systea  uniquely.  It  was  also  suggested 
that  an  optiaization-type  algorithm  could  be  used  to 
deteraine  explicitly  such  a  state  space  realization. 

A  topic  for  future  research  is  to  deteraine 
whether  the  nonuniqueness  of  the  essentially  balanced 
realization  can  be  exploited  to  find  subsets  of  this 
class  which  have  other  desirable  properties.  For 
exaaple,  in  the  context  of  finite  wordlength  effects, 
there  is  the  deterainistic  effect  due  to  coefficient 
truncation  and  the  stochastically  modeled  effect  due  to 
coaputation  roundoff.  Hare  we  have  shown  that  essen¬ 
tially  balanced  realizations,  i.e,  a  realization  where 
the  controllability  and  observability  graaaiana  are 
equal,  have  a  certain  ainimal  sensitivity  property, 
with  respect  to  parametric  perturbation.  On  the  other 
hand,  Mullis  and  Roberts  have  demonstrated  that,  with 
raspect  to  roundoff  noise,  an  optimal  wordlength  filter 
follows  froa  a  state  space  realization  whara  such 
graaaiana  are  simultaneously  diagonal  (though  not 
necessarily  equal)  19).  Hence,  both  optiaality 
properties  are  possessed  by  the  usual  balanced 
realization. 

Anothar  topic  for  future  research  is  to  relate 
the  results  stated  herein  concerning  the  paraaetrlc 
sensitivity  of  the  aatrix  factorization  ■  •  OR  to  the 
parametric  sensitivity  of  the  state  space  triple 
(A, 8,C). 
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Abstract 


There  exists  a  certain  correspondence  between  the  problems  of 
observability  and  identification.  A  less  familiar  correspondence  is  the  one 
relating  controllability  to  a  "dual"  of  the  identification  problem:  the 
"DESIGN" -problem.  This  amounts  to  the  choice  of  a  realization  or 
approximation  of  a  desired  system  response,  using  parameters  that  can  only 
be  approximately  adjusted,  e.g.  due  to  quantization.  A  particular 
application  is  in  the  design  of  digital  filters,  simulators  and  controllers, 
which  minimize  the  effects  of  component  tolerances  in  analog  systems  or 
finite  wordlength  effects  in  the  digital  discrete  case. 

A  geometric  approach  to  the  design  problem  is  be  presented,  and  its 
solution  given  under  a  useful  criterium  for  optimality.  For  linear  time 
invariant  systems,  the  minimum  sensitivity  realizations  of  a  desired  Hankel 
matrix  are  linked  to  the  Balanced  Realizations. 
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1.  THE  PROBLEM  DEFINITION  AND  HISTORY 


This  paper  deals  with  a  new  geometric  approach  to  the  robustness 
problem.  Classically,  the  sensitivity  properties  of  a  given  realization 
have  been  investigated,  either  via  a  "sensitivity  system",  which  is  usually 
prohibitively  large  [TV,F],  or  alternatively,  via  the  operator  form  [RMAD]. 
Muller  and  Weber  determine  the  control  and  observation  sensitivity  in  [MW] , 
and  maximize  scalar  measures  for  the  "duality"  with  respect  to  certain 
structural  parameters.  The  questions  of  robustness  with  respect  to 
variations  of  certain  structural  parameters  is  closely  related  to  this 
problem,  and  treated  by  Ackermann  in  [A].  Finally,  sensitivity  analysis 
from  a  geometric  point  of  view  was  recently  introduced  by  Delchamps  [D] ,  and 
applied  to  compensation  and  feedback.  Our  emphasis  will  be  in  optimal 
implementations  of  systems  with  quantized  or  inaccurate  parameters. 

Consider  a  linear  time  invariant  system  (A,B,C)  with  m  inputs  and  p 
outputs.  For  our  applications,  this  may  be  a  model  for  a  real  system  one 
wants  to  simulate,  the  implementation  of  a  digital  or  analog  filter,  or  an 
observer-controller  implementating  an  optimal  regulator  for  some  given 
plant.  In  all  these  applications,  only  the  relationship  between  the  input 
and  the  output  of  the  implemented  system  is  important.  Usually  the  so 
called  "Canonical  Forms"  are  implemented  because  they  minimize  the  number  of 
parameters  that  is  required,  and  allow  for  a  pipelined  realization  of  the 
devices,  e.g.  the  "Direct  Forms"  in  digital  signal  processing.  A  minimal 
number  of  parameters  corresponds  to  minimal  complexity,  a  quality  that  may 
be  Important  if  the  operation  count  becomes  important.  However,  a  minimal 
set  of  parameters  has  no  redundancy,  and  therefore  one  may  expect  high 
sensitivity  with  respect  to  these  parameters. 

This  paper  investigates  how  the  nonuniqueness  of  the  state  space 


realizations  can  be  utilized  to  determine  optimal  parametrizations  under 
various  measures  of  "optimality".  In  particular,  two  issues  seem  to  be 
important  for  the  practical  implementation  of  a  given  transfermatrix: 
sensitivity  and  clustering.  The  minimal  sensitivity  requirement-  guarantees 
that  the  actual  realized  transfermatrix  is  "close"  to  the  nominal  transfer 
matrix.  Clustering  deals  with  the  desired  parameter  values.  If  for 
instance  a  fixed  point  implementation  is  used,  then  it  is  desirable  to  have 
all  parameter  values  in  some  range  or  ranges.  It  relates  to  the  problem  of 
realizing  an  approximation  to  a  given  system  with  parameters  chosen  from  a 
finite  set  with  fixed  values.  This  paper  focusses  on  the  first  problem. 

Our  approach  to  the  problem  is  geometric,  as  in  [D] .  A  full 
parametr ization  of  the  system  has  n  +np+nm  parameters.  A  minimal 
parametrization  on  the  other  hand  requirs  n(l+m)  or  n(l+p)  parameters  if  p-1 
or  m-1  [H] ,  or  if  p  and  m  are  both  larger  than  1,  somewhere  between 
n(m+l)+p(p+l)  and  n(m+p)  parameters  [B] . 

Because  addition  and  scalar  multiplication  of  systems  have  no 
meaningful  natural  interpretations,  the  parameter  space  is  simply  assumed  to 
have  the  structure  of  an  affine  space  of  dimension  n(n+m+p) .  Each  point  in 
this  space  represents  a  particular  realization  of  an  m  input,  p  output 
linear  time  invariant  system  of  order  n  (or  less).  The  space  is  given  the 
structure  of  a  Riemannian  manifold  by  introducing  an  Euclidean  metric  in  the 
tangent  space  at  each  point.  For  Instance  in  the  analysis  and  design  of  the 
finite  wordlength  effects  with  fixed  point  processing,  a  uniform  metric  for 
all  tangent  spaces  is  appropriate,  whereas  for  floating  point  processing,  a 
metric  varying  smoothly  from  point  to  point  is  more  appropriate. 

Clearly,  in  this  space,  certain  connected  subsets  will  exist  such  that 
all  points  on  such  a  subset  correspond  to  a  system  with  one  and  the  same 
input  output  behavior.  In  fact  we  can  resolve  (i.e.  partition  into 


equivalence  classes)  the  whole  space  into  disjoint  sets,  corresponding  to 
different  input  output  behaviors.  For  a  particular  realization,  the 
proximity  of  neighboring  sheets  will  be  an  indication  for  the  robustness  or 
sensitivity  of  this  realization.  Hence  optimal  realizations  are  those  at 
which  the  "inner-sheet"  distances  are  maximal.  These  geometric  notions  are 
made  precise  in  section  3,  after  giving  a  more  philosophical  introduction  in 
section  2  on  the  design  problem  and  its  relation  with  other  systems 
problems.  This  theory  is  applied  to  systems  design  in  section  4.  The  most 
interesting  result  is  the  one  relating  the  minimum  sensitivity  (under  the 
fixed  point  metric)  realizations  to  the  balanced  realizations. 

2  SITUATION  OF  THE  PROBLEM 

A  rather  unusual  viewpoint  due  to  Root  [R] ,  considers  the  phenomenon 
"linear  system"  as  a  mapping  a  from  a  suitable  subset  of  the  cartesian 
product  of  input  functions  (U)  and  realizations  (X)  to  the  set  of  output 
functions  (Y) .  The  restriction  to  a  certain  subset  (we  will  not  go  Into  the 
details  of  this)  is  necessary  for  the  convergence  issues.  For  continuous 

linear  time- invariant  systems,  the  mapping  stands  for  the  convolution 

* 

operator 

o  :  U  x  Z  - >  Y  :  (u(.),S)  — >  y( . ) 

y(t)  -  J*  CeA(t'r)Bu(r)dr 
•  00 

For  discrete  systems  a  similar  expression  results.  We  can  now  look  at  the 
marginal  maps  derived  from  the  linear  system  opeator.  In  particular,  if 
S  -  (A,B,C)  is  fixed,  we  define  the  usual  linear  input  output  map  as 
os  :  U  x  (S)  — >  Y  :  u( . )  — >  y( . ) 

On  the  other  hand,  for  a  fixed  input  u(.),  the  marginal  maps 
ou  :  (u)  x  X  — >  Y  :  S  — >  y(.) 
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associate  with  each  realization  S  e.g.  the  impulse  response  h(t)  if  u(t)  - 
J(t),  or  the  transferfunction  H(p)  characterizing  the  steady  state  response 
to  a  sinusoid  u(t)  -  ePc  of  complex  radial  frequency  p. 


The  control  and  deconvolution  problems  are  inverse  problems  for  the  map 
oB in  the  sense  that  the  former  relates  to  the  derivation  of  a  right- inverse 
and  the  latter  to  a  left-inverse  of  the  map.  Moreover,  a  certain  causal 
structure  is  implicit  in  the  problem.  In  designing  a  control  to  achieve  a 
desired  output,  invariably  "future"  actions  are  understood,  while  in  the 
deconvolution  problem  one  acts  on  observed  data,  and  thus  relates  the  "past" 
of  u( . )  and  y(.).  Similarly,  the  construction  of  a  left- inverse  for  au 
pertains  to  the  system  identification  problem,  invariably  tied  to  an 
observation  of  functions  or  time  series,  and  hence  relating  the  "past"  of 
y(.)  to  the  system.  Finally,  finding  a  right- inverse  of  au  is  the  problem 
of  ^designing"  a  system  with  desired  "future"  behavior. 

In  the  identification  problem  the  measured  data  is  necessarily 
corrupted  with  uncertainties  due  to  the  finite  observation  tie  and  and 
finite  memory  effects.  It  may  even  be  impossible  to  isolate  the  phenomenon 
of  interest  from  the  rest  of  the  universe.  Similarly,  uncertainties 
interfere  with  the  design  problem:  the  parameter  values  necessarily  must 
have  a  finite  precision.  In  order  to  find  "uniquely"  an  "optimal"  solution 
to  these  problems,  one  introduces  a  suitable  distance  measure  or  norm  in  the 
domain  and  range  spaces  [V] . 

3 .  MAIN  RESULTS 

A  summary  of  some  known  results  on  the  geometry  of  systems  and  their 
realizations  is  first  given.  It  is  established  that  the  subsets  of  the 
realization  space  of  system  realizations  which  exhibit  identical 
input/output  behavior  form  smooth  manifolds.  However  the  totality  of  all 


such  "sheets"  does  not  have  the  structure  of  a  foliation  since  not  all  these 
subsets  are  manifolds  of  the  same  dimension.  A  restriction  on  the  set  of 
systems  is  required  in  order  to  avoid  such  degeneracies.  The  next 
subsection  discusses  the  robust  design  on  an  abstract  level. 

3.1  The  geometric  structure  of  the  realization  space. 

Let  n  p  be  the  realization  space,  i.e.  the  space  of  all  triples  of 
matrices  (F,G,H)  of  dimensions  n  x  n,  n  x  m,  p  x  n.  Only  realizations  over 
the  real  field  R  will  be  delth  with  here.  Since  there  is  little  significance 
to  the  addition  and/or  scalar  multiplication  of  realizations,  this  space  is 
not  endowed  with  a  vector  space  structure,  but  rather  that  of  an  affine 
space  with  vector  space  Rn(m+n+P),  Hence  at  each  point  S,  there  is  an 
attached  vectorspace  T<jL  (the  tangentspace  at  S),  isomorphic  to  Rn(m+n+P) 

The  group  Gln(R)  acts  differentiably  on  the  right  to  1^  n  p,  via 

(A, B, C)  - >  (A,B,C)T  -  (TAT'^.TB.CT*1) 

which  of  course  corresponds  to  a  change  of  base  in  the  state  space  z  -  Tx  . 
The  quotient  topology  is  non  Hausdorff  in  general.  The  restriction  to  the 
completely  reachable  (or  dually,  the  completely  observable)  pairs  eliminates 
many  problems,  in  particular,  the  action  of  Gln(R)  is  free  (as  a  consequence 
of  reachability/observability)  and  the  quotient  space  (set  of  orbits) 

^n.p  “  ^m.n.p/01*/1^  is  a  smooth  (real)  analytic  (thus  certainly  C,c) 
differentiable  manifold  (hence  Hausdorff)  of  dimension  n(m+p)  [H2].  The  set 
of  equivalence  classes  of  minimal  realizations  are  analytic  open 

sub-manifolds,  [H2].  In  the  system  identification  problem,  this  space, 
called  parameter  space,  plays  a  crucial  role.  Its  properties  have  been  well 
studied,  (e.g.  in  relation  to  the  (non)existence  of  continuous  canonical 
forms  [H2],  and  degeneration  phenomena  [H3].  The  best  one  can  do  based  on 
input-output  data  alone  is  to  identify  the  orbit  of  a  minimal  realization  S, 
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i.e.  parametrize  the  input-output  operator  in  some  (minimal)  way,  by 
choosing  a  point  in  h£°^Cp.  An  obvious  question  is:  "Does  a  sequence 
(obtained  as  more  data  comes  in)  of  points  in  M£°^Cp  necessarily 
converge  to  a  point  in  m£°^c£?"  The  answer  is  no:  degeneration 
phenomena  occur,  and  so-called  generalized  systems  appear  [H2,H3].  It  has 
been  established  that  there  is  a  continuous  canonical  form  on  L^°^Cp  if 
and  only  if  min(p,m)  -  1. 

Since  the  isotropy  subgroup  is  trivial  for  all  reachable  or 
observable  realizations,  its  dimension  is  constant  on  L£°^Cp,  an(*  hence 
the  orbits  of  Gln(R)  form  a  foliation  F  of  L^°^Cp  of  dimension  n(m+p) 

[ *-) .  The  field  of  tangent  spaces  to  the  leaves  form  an  n(m+p)  dimensional 
subbundle  r(F)  of  the  bundle,  called  the  tangent  bundle  to  F.  The  quotient 
bundle  u(F)  -  TL/r(F)  is  called  the  normal  bundle  to  F. 

Our  interest  is  not  in  the  universal  parametrization,  but  in  the 
orbits  under  the  action  of  Gln(R)  itself.  These  orbits  are  open,  and  the 
boundary  points  of  reachable  realizations  are  non  reachable  realizations. 

The  explicit  form  of  the  closure  of  the  orbits  was  adressed  in  [KM] . 

We  shall  endow  the  tangentbundle  TL^°^Cp  with  a  positive  definite 
metric. 

<.,.>s:  TSL  x  TSL  - >  R  for  all  S  in  l£°n?p 

This  metric  is  associated  with  the  tolerance  in  the  components  of  the 
realization  and  is  different  from  the  (observability,  reachability  and 
riccati  metrics)  on  the  associated  vector  bundle  (the  state  bundle)  of  the 
principal  fibre  bundle  w  :  L  — >  M  with  structural  group  Gln(R) ,  discussed 
by  Delchamps  [D] .  A  uniform  metric,  P(S)  -  *n(m+n+p)  would  for  instance  be 
useful  in  the  design  of  fixed  point  computer  realizations  of  a  given  m  x  p 


regulator  of  McMillan  degree  n.  Its  Induced  norm  Is  the  Frobenius  norm  of 

fA  B1 

the  realization  matrix  [C  OJ . 

3.2  The  Robust  Design  Problem:  A  Geometric  Approach. 

Before  proceeding  with  our  system  design,  we  shall  prove  a  general 
result  on  sensitivity: 

Definition:  Let  6  be  an  N-  dimensional  open  subset  of  an  affine  space  A^  of 
design  parameters  (configurations).  By  an  Observable .  we  shall  mean  any 
smooth  function  f:  0  — >  R  which  has  no  critical  points  (i.e.  the  gradient 
is  never  zero) . 

Remarks  :  i)  The  reason  for  considering  open  subsets  is  that  typically  for 
our  applications,  but  not  exclusively,  this  situation  occurs  if  the 
inverse  image  of  a  finite  set  of  points  by  a  continuous  map  from  this 
set  to  the  reals  is  cut  out. 

ii)  The  significance  of  an  observable  is  that  any  two 
configurations  8±  and  8 2  in  the  parameterspace  are  indiscernable  by 
observation  of  f  if  f(0^)  -  fC^)-  This  allows  us  to  regard  two 
parametrizatlons  yielding  the  same  observable(s)  as  being  the  same  (or 
equivalent)  for  some  purpose.  In  a  systems  context,  an  observable  is 
for  instance  the  value  of  the  transfer  function  (scalar  case)  at  a 
particular  frequency,  or  the  impulse  response  evaluated  at  a  specific 
Instant.  They  are  also  referred  to  as  "system  functions"  [F] . 

iii)  The  gradient  of  a  function  depends  on  the  metric  of  the 
space.  The  vanishing  of  a  gradient  at  a  point  is  independent  of  the 
chosen  metric. 

Every  such  map  induces  a  partition  of  6  into  equivalence  classes,  in 
fact,  these  equivalence  classes  form  what  is  commonly  known  as  a  foliation. 
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In  this  case,  che  submanifolds  are  the  level  surfaces  of  f,  and  have 
dimension  N-l.  There  exist  a  vector  field  normal  (in  terms  of  some 
arbitrarily  chosen  Riemannian  metric)  to  the  leaves. 


The  whole  issue  of  the  sensitivity  problem  is  now  to  find  the  points  on 
the  leaves  corresponding  to  a  maximal  "separation"  of  the  leaves  of  the 
foliation.  Of  course,  this  notion  needs  to  be  made  precise,  since  the 
leaves  are  densely  stacked. 

3.2.1  Riemannian  Metrics 

If  0  is  paracompact,  then  a  Riemannian  structure  G  can  be  put  on  0  (or, 
more  exactly,  on  its  tangentbundle) .  This  means  that  for  each  8  e  0,  a 
symmetric,  positive  definite  bilinear  form  Gg  is  defined  on  the  vector  space 
T^0,  such  that  G  defines  a  metric  on  T0,  i.e.  is  a  smooth  section  of  the 

0  ji  ^ 

vector  bundle  T20.  Let  :  I  0  — >  T0  be  the  natural  isomorphism  of  each 
space  T^0  with  T^0.  If  f  is  a  smooth  map,  the  gradient  of  f  is  defined  as 
the  element  df*  of  T0  (i.e.  the  vector  field  corresponding  under  the  map  # 
to  the  differential  forrm  df ) .  In  the  local  coordinates  this  is  given  by 

df  3 

Vrf  -  g1J  - 

38s-  38 J 

where  the  summation  convention  is  used.  The  matrix  (g^^)  is  the  inverse  of 
the  metric  tensor  (g,,) 


a 

$ 

$ 


8iJ ' 0  ( i?  .jr0 

The  squared  norm  of  the  gradient  is 


J  44  df  af 

«Vl  “  G<V-V>  -  81J  -7  -T 

ocJ 


285 


If  Q  is  foliated  by  f ,  then  the  tangentspace  Lg  to  the  leaf  through  9 
is  an  N-l  dimensional  subspace  of  T^S. 

3.2.2  Extremal  Sensitivity  Theorem 

Points  of  extremal  sensitivity  (with  respect  to  an  observable  f(8),  are 
determined  by  minimization  of  L(0)  -  ^||VQf||^  over  the  leaf  characterized  by 
a  particular  value  of  the  observable  f. 

The  problem  is  to  find  the  points  on  the  leaves  for  which  the  effects 
of  an  infinitesimal  perturbation  are  minimized.  A  worst  case  analysis  leads 
to  the  minimization  of  the  gradient  norm  ||Vgf||  -  G(VGf , Vgf)**,  or 
equivalently,  but  mathematically  more  convenient,  the  map  h 

h  -miv»2 

This  scalar  field  induces  a  vector  field  in  the  tangent  space  of  the 
leaf.  However,  note  that  dh*  -  dGCdf^df*)*  is  in  general  not  tangent  to 
the  leaf.  Its  projection  on  the  tangentspace  to  the  leaf  at  6  yields  the 
tangent  vector  dG(df#,df#)#  -  Adf*  to  the  leaf  through  8,  for  some  A  e  R. 
Equivalently,  we  could  have  worked  directly  with  the  Hamiltonian  for  the 
constrained  problem,  as  the  points  are  constrained  by  the  leaves  of  the 
foliation  (f-cst)  of  6. 

h  -  HlVll’  -  Af 

Either  way,  it  leads  in  coordinate  free  form  to 

Theorem  2:  If  f  is  an  observable  for  the  parameter  space  (8,G)  ,  then  the 
points  of  extremal  sensitivity  with  respect  to  f  are  implicitly  determined 
by  the  equation 

dG(df*,d£*)*  -  Adf*  -  0 
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proof:  The  stated  condition  is  the  Euler -Lagrange  equation  for  the 
constrained  optimization  problem. 


The  gradients  of  h  and  f  are  alligned  at  the  extremal  sensitivity 
points.  In  particular,  for  the  uniform  metric,  the  condition 

specializes  to 

(  fg9{.)  -  AI  )  f,(.)  -  0 

while  for  the  relative  metric  ,  which  is  useful  in  connection 

with  the  floating  point  arithmetic,  the  condition  is 

[  diag(ff )diag(f j)  +  diag(0a)f90  ]  diag(02)f#  -  A  diag(«2)f 
In  the  latter  case,  a  simpler  form  is  obtained  by  using  the  "generalized” 

A 

gradient  Vf  with  components  0 ^  df/db^  instead;  corresponding  to  the 
generalized  Hessian  H  -  diag(Vf)  +  diag(0)fggdLag(0) 

We  state  what  was  just  shown  as  an  important 

Corollary:  The  extremal  sensitivity  points  of  (0.G),  where  G  is  the  uniform 
or  relative  metric,  are  the  points  where  the  gradient  df*  is  in  the 
eigenspace  of  the  generalized  Hessian  operator  H:  T^8  — >  T^0,  i.e. 

(H(f )  -  AI)df*  -  0 

o 

Example  1 .  Consider  in  R  the  foliation  0\02  “  constant.  With  the  uniform 
metric,  the  extremal  sensitivity  points  are  6n  the  diagonals  0^  -  ±02-  w*c 

A 

the  relative  metric,  the  generalized  gradient  is  Vf  -  0^2  and  the 

generalized  Hessian  H  -  0^02  (l  l] 

The  problem  is  degenerate.  Every  point  is  an  extremal  sensitivity  point. 

Example  2 .  Consider  in  the  leaf  f(0)  -  1  of  an  ellipsoidal  foliation 
given  by  f(0)  -  0^/a^  +  0^ /b2.  For  the  uniform  metric,  the  gradient  is 

(20j/a2,2  $2^)'  and  the  Hessian  diag(2/a2 ,2/b2) .  The  eigenvectors  of  the 


Hessian  are  (1,0)  and  (0,1)  corresponding  to  the  extremal  sensitivity  points 
(±a,0)  and  (0,±b).  If  j a |  £  |b|  then  the  former  is  a  maximal  sensitivity, 
and  the  latter  a  minimal  sensitivity  point.  With  the  relative  metric,  one 
finds  the  extremal  sensitivity  points  it  ■  ta/J 2  and  8 2  "  ±b/7 2  ,  i.e. 
the  points  where  the  diagonal  of  the  enclosing  rectangle,  with  sides  2a  and 
2b,  intersects  the  ellipse. 

4  APPLICATION  TO  ROBUST  REALIZATIONS 

For  the  application  to  linear  system  realizations  and  design  it  was 
found  fruitful  to  express  the  parametrization  in  terms  of  the  components  of 
a  factorization  of  the  system  Hankel  matrix  H  -  OR,  where  0  and  R  are 
respectively  the  observability  and  reachability  matrices  of  the  realization. 
This  does  not  quite  solve  the  optimal  realization  problem,  but  it  provides  a 
suboptimal  solution,  which  is  mathematically  more  tractable.  The  Hankel 
matrix  defined  as  a  map  with  domain  1^  n  p  plays  the  role  of  a 
multidimensional  observable.  The  details  for  discrete  time  systems  are 
given  in  [GV] .  In  this  paper  the  continuous  time  systems  design  under  the 
uniform  metric  is  discussed,  for  square  (p-m)  systems  only.  The  general 
case  is  more  tedious  and  will  be  published  elsewhere. 

Definitions:  Let  L^fO,®)  be  the  Hilbertspace  of  m-vectorfunctions  with 
inner  product  <x(.),y(,)>  -  x(t)'y(t)  dt.  The  reachability  operator 

E-'  L^fO,®)  — >  Rn  for  a  realization  (A,B,C)  is  defined  by 
&u(t)  -  eAtBu(t)  dt.  Its  adjoint  E*  is  the  operator 
E*:  Rn  — >  Lf[0,®)  :  R*x  -  B'eA'cx 

The  observability  operator  is  2^  Rn  — >  L^[0,®)  :  2x  ”  CeAtx.  Since  E 
and  2  have  a  finite  dimensional  range  and  domain  respectively,  they  are 
compact  operators  [K,  p.157].  Furthermore,  their  composition  2E  is  also 


v  -■>  v  v  v  vw  v  -v  r*  v 


compact  [K,  p.  158].  Finally,  we  introduce  the  Hankel  operator  H: 

1^[0,®)  — >  L|[0,®)  :  Hu(t)  -  J^h(t+r)u(r)dr  where  h(t)  -  CeAtB  is  the 
impulse  response  of  the  realization  (A,B,C).  It  is  readily  verified  that 
indeed  B  “  £B-  An  operator  A:  L^IO,®)  — >  L^fO,®)  satisfying 

if  if 

AA  —  A  A  -  Id  (the  Identity  operator)  is  called  isometric.  We  shall  also 
assume  that  the  set  {e^}"^  is  the  standard  basis  for  Rn  and  that  the 
functions  form  a  complete  orthonormal  basis  in  L^O,®).  Many 

notations  are  used  in  such  a  setting.  We  found  it  relatively  easy  to  use 
the  Dirac  notation  for  the  vectors  and  their  duals  (i.e.  the  bra  and  ket 
notation).  In  this  form  we  have: 

B-5  ^vl  V  <*vl 

uv 

B  -  ^  r^j |ej>  | 

ij  J  J 

2  “  1  ©kll^  <ell 

kl 

The  matrix  representations  (hjj ) ,  [r^j]  and  [ojj]  will  be  respectively 

t 

denoted  by  Mat(fl) ,  Mat(B)  and  Mat(2).  By  Vec(M)  we  mean  the  vector 
formed  by  stacking  the  elements  of  the  matrix  M  columnwise. 

It  is  now  possible  to  state  our  first  auxiliary  result: 

Lemma:  Let  E:  L^O,®)  — >  L^tO,®)  be  such  that  Tr  AE  -  0  for  all 
isometric  operators  A,  then  E  -  0. 

proof:  Suppose  E  has  the  singular  value  decomposition  [ K ,  p.261] 

E  -  2  9^  |uj>  <VjJ 

where  (u^)  and  {v^J  are  orthonormal  sets  in  L™ [ 0 , «») ,  then  choosing  A  as 
T|vj>  <Uj |  yields  -  0  . 


Since  the  singular  values  are  nonnegative,  we 


must  have  all  -  0  and  hence  E  -  0. 

In  order  to  apply  the  theory  developed  in  the  previous  section,  we 
consider  the  affine  space  formed  by  the  matrixelements  of  2  and  fi,  so  that 
the  parametervector  is  $'  -  [Vec(Mat(fi) ' ) ' ,  Vec(Mat(2) ) ' ]  . 

Analogous  to  the  discrete  case  [GV],  we  shall  consider  the  observables: 
fA(?)  -  TrA(H-OR) .  Denote  by  MQ(f^)  the  leaf  on  which  fA  is  constant, 
zero  say,  then  we  have  the 

Theorem  3:  The  extremal  sensitivity  points  of  MQ(fA)  have  the  property  that 

ER*  -  2*2- 

proof:  Substitute  the  bra-ket  expansions  in  the  expression  for  the 
observable  £(0) ,  and  use  the  orthonormality  of  the  bases.  This  reduces  the 
continuous  time  problem  to  the  matrix  problem,  solved  in  [GV] ,  where  it  was 
shown,  based  on  the  corollary  to  Theorm  2,  that  the  extremal  sensitivity 
points  satisfy 

Mat(R)Mat(£)'  -  Mat  (2)’Mat(£) 

Re-expressing  Mat(£)Mat(E) '  and  Mat  (£)'Mat(2)  in  the  basis 

gives  then  the  condition  in  terms  of  the  original  operators:  gR  -  0  0  ■ 

Corollary:  The  minimal  sensitivity  realizations  on  the  Gln(R) -orbit  of  a 
minimal  realization  of  H  are  the  essentially  balanced  (i.e.  balanced  modulo 
an  orthogonal  transformation)  realizations. 

proof:  Observe  first  that  the  condition  for  an  extremum  did  n<>c  depend  on  the 
choice  of  A,  and  therefore  must  be  true  for  all  Isometries,  or  observables 
fA.  All  extremal  sensitivity  points  of  f^  belong  therefore  to  the 
intersection  f]A  M0(f^).  By  the  lemma,  the  intersection  of  the  manifolds 


M0(f^)  is  the  submanifold  characterized  by  H  -  £R,  i.e.  the  orbit  of 
the  system  with  Hankel  operator  H  under  the  action  of  Gln(R) . 

'jc 

Then,  by  the  previous  theorem,  ££  -  £  £  so  that 

<x . RR*y>  -  <x,£*£y>  Vx,y  6  Rn 

<£*x,fi*y>  -  <flx,£y> 
which  in  integral  form  is 

x'  J^eAtBB'eA,tdt  y  -  x'  J^eA' cC'CeAtdt  y 
By  definition,  this  states  the  equality  of  the  Reachability  Gramian  with  the 
Observability  Gramian.  Realizations  having  this  property  are  essentially 
balanced,  hence  their  name,  as  an  orthogonal  similarity  transformation  will 
make  them  truly  balanced  (equal  and  diagonal  gramians) ,  in  the  sense  of 
Moore  [M] .  The  observables  f^(0)  are  invariant  with  respect  to  such 
transformations .  The  second  variation  property  is  used  to  show  that  the 
extremal  solutions  obtained  indeed  correspond  to  minimum  sensitivity 
solutions.  Finally,  all  inf initessimal  variations  in  the  parameters  of  the 
factorizations  of  the  Hankel  matrix  lead  to  second  order  variations  in  H. 

But  small  (first  order)  variations  in  the  reachability  and  observability 
matrices  are  themselves  linked  to  first  order  variations  in  the  realization 
parameters.  It  follows  thus  that  any  essentially  balanced  realization  is 
truly  a  minimum  sensitivity  realization!  ■ 

As  shown  by  the  main  theorem,  it  suffices  to  find  an  essentially 
balanced  realization  of  the  given  system.  The  characterization  as  a 
factorization  of  the  Hankel  matrix  is  therefore  independent  of  the  size  of 
the  Hankel  matrix  considered,  as  long  as  it  is  large  enough  to  specify  the 
given  input-output  relation. 


5  CONCLUSIONS 


The  optimal  sensitivity  properties  for  the  continuous  time  realizations 
have  been  derived.  By  using  expansions  in  a  complete  orthonormal  basis  in 
the  function  space  L2 ,  the  problem  was  reduced  to  the  discrete 
timeoptimality  problem,  solved  in  [GV] .  In  both  cases  the  balanced 
realizations  are  therefore  optimal.  The  balanced  realizations  have  been 
widely  used  in  model  reduction  methods,  even  though  no  clear  optimality 
properties  were  known  about  the  resulting  reduced  order  models  [M,G1]. 

We  have  restricted  our  discussion  to  square  systems  (m-p)  and 
minimal  realizations.  Extensions  of  the  theory  are  in  progress.  It  seems 
intuitively  clear  that  one  could  further  exploit  the  redundancy  of  a 
realization  by  deliberately  using  nonminimal  realizations.  Finally,  the 
idea  in  the  proof  of  the  main  sensitivity  theorem  leads  to  gradient  type 
algorithms  for  the  optimal  sensitivity  realizations.  Some  preliminary 
remarks  regarding  these  appear  in  [GV). 
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ABSTRACT 

This  paper  considers  a  switched  parameter 
stochastic  linear  system  whose  state  equations 
depend  on  a  finite  state  Markov  process.  The 
decomposition  of  the  system  and  the  process 
together  into  fast  and  slow  components  is  inves¬ 
tigated  in  the  paper  when  both  are  singularly 
perturbed.  The  results  can  be  shown  to  hold  when 
the  process  is  independent  of  the  original  system, 
is  ergodic,  and  the  matrices  of  the  different 
models  of  the  system  commute. 


INTRODUCTION 

This  paper  considers  the  limiting  behavior  of 
singularly  perturbed  switched  parameter  linear 
stochastic  models.  Such  models  occur  in  many 
detection-estimation  schemes  [1]  and  in  the  study 
of  multiple*  control  systems  [21.  They  have  also 
been  described  more  recently  as  hybrid  systems  [3J. 
The  paper  is  concerned  with  the  properties  of  such 
models  when  both  the  continuous  and  the  underlying 
discrete-event  processes  are  singularly  perturbed. 
Stochastic  linear  singularly  perturbed  systems  have 
already  been  considered,  and  their  properties  are 
well  documented  [4].  Similarly,  [5]  considers  Che 
aggregation  of  states  for  singularly  perturbed 
discrete-event  Markov  processes.  Here  we  study  the 
combination  of  both  types  of  behavior  for  singular¬ 
ly  perturbed  linear  stochastic  models  which  depend 
on  a  discrete-event  Markov  process,  also  singularly 
perturbed. 

The  system  model  is  assumed  to  have  the 
following  set  of  state  equations 

xj  -  A1[u(t)|  xi  +  A12(u(t)]  x2  +  w(t)  (1) 

u  x2  *  A21[u(t)]  x^  +  A2(u(t)]  x2  +  B2  w(t)  (2) 

where  w(t)  is  a  white  Gaussian  noise  vector,  and 
where  u  >  0  is  a  small  parameter.  Thus  X}(t) 

represents  the  slow  mode  of  the  system,  and  x2(t) 
the  fast  mode.  The  process  u(t)  is  a  discrete- 
event  Markov  chain  with  transition  probability 
matrix  P(r),  and  is  assumed  to  be  independent  of 
x(t)  and  ergodic.  The  singularly  perturbed  case 
assumes  that  the  MxM  transition  probability  matrix 
P(t)  may  be  block  partitioned  into  Pu(U  (of 
dimension  MjxMj),  P12(r),  P21(t),  and  P22(i/c)  (of 
dimension  M2xM2),  where  c  >  0  is  a  small  parameter. 
The  parameter  c  represents  the  fact  that  some  of 
Che  states  of  u(t)  are  assumed  to  have  fast 


transition  probabilities.  Thus  u(t)  may  be 
described  as  having  two  sets  of  discrete  states,  S| 
containing  M^  states,  and  So  containing  M2  states, 
representing  the  slow  and  fast  components,  respec¬ 
tively.  The  methods  developed  in  [51  may  be  used 
to  aggregate  the  fast  states  into  a  single  state  so 
Chat  as  far  as  the  slow  time-scale  is  concerned  the 
process  may  be  approximated  by  an  Mj+1  state  slow 
Markov  process. 

The  limiting  behavior  of  the  overall  system 
depends  on  the  relative  size  of  u  and  c  as  they 
both  tend  to  zero.  This  reflects  the  fact  that  the 
fast  time-scales  of  the  discrete  process  and  the 
continuous  one  may  not  be  equally  fast.  While  the 
general  case  is  considered  in  the  paper,  this 
summary  will  only  address  the  problem  when  w  and  c 
have  the  same  order  of  magnitude,  and  are  assumed 
to  be  equal.  Furthermore,  for  simplicity  (and  due 
to  the  possibility  of  transforming  the  original 
system  into  a  decoupled  one)  we  shall  concentrate 
on  the  decoupled  case,  l.e.,  A12  »  0,  and  A21  »  0. 
The  summary  of  the  main  results  are  given  in  the 
following  sections. 

FAST  MODE  SYSTEM 

The  analysis  of  the  fast  subsystem  is  rela¬ 
tively  straightforward  and  depends  on  whether  it  is 
of  interest  in  its  own  right  or  as  input  to  slow 
subsystems  as  discussed  in  [4],  If  It  is  of 
interest  as  an  input  to  slow  subsystem,  then  as  u 
tends  to  zero  the  process  x2(t)  tends  to  a  white 
noise  process 

x2(t)  »  -  A2-!  B2  w(t)  +  error  (3) 

where  it  is  assumed  that  all  the  values  of  A2  are 
stable,  and  where  the  dependence  on  u(t)  is  omitted 
for  simplicity.  Hence,  the  fast  process  behaves  in 
the  limit  as  a  switched  parameter  white  Gaussian 
noise  whose  covariance  A2'^B2QB2' A2' switches 
among  the  M^  values  based  on  the  slow  states  of  the 
underlying  Markov  process  u(t).  The  question  of 
the  Involvement  in  this  limit  of  the  fast  component 
of  u(t)  is  still  open.  The  error  can  be  expressed 
in  terms  of  the  integral  of  x2(t)  and  it  can  be 
shown  that  the  the  mean-squared  error  is  0(u)  as  in 
the  standard  singularly  perturbed  case. 

When  the  limit  is  required  for  the  purpose  of 
analyzing  the  fast  state  x2(t)  directly,  then  the 
analysis  need  to  be  carried  in  the  stretched  time¬ 
scale  (t-t^)/u  after  appropriate  scaling  (as 
mentioned  in  [6],  for  example)  of  the  white  noise 
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process  Co  obtain  finite  variance  for  X2(t).  The 
time  instants  {tj}  represent  the  transition  times 
from  the  slow  states  of  the  process  to  a  fast  state 
of  the  process.  In  this  case  it  can  be  shown  that 
in  the  stretched  time-scale  the  fast  subsystem  may 
be  modeled  as  a  switched  parameter  process  depen¬ 
ding  only  on  the  fast  states  of  the  Markov  chain. 
When  u(t)  takes  values  among  the  slow  states,  then 
in  the  stretched  time-scale,  X2(t)  behaves  app¬ 
roximately  as  any  time  invariant  process  with 
constant  parameters  held  to  their  values  at  the 
last  slow  transition. 

SLOW  MODE  SYSTEM 

The  analysis  of  the  slow  modes  of  the  system 
is  based  on  writing  the  solution  of  Xj(t)  as  a 
function  of  the  fast  states  for  the  duration  of 
intervals  of  transitions  among  the  fast  states  of 
u(t).  The  solution  is  given  as  a  standard  state 
equation  solution  of  a  time  varying  linear  system 
with  white  noise  input  w(t).  The  time  varying 
nature  stems  from  the  dependence  of  the  system 
matrix  on  che  different  values  of  the  fast  states 
of  u(t).  Let  these  values  be  denoted  by  AjfujjJ 

where  {uj^,  i  >  1,2 . M2}  are  the  fast  states  of 

u(t).  A  crucial  assumption  for  the  proof  of  the 
results  is  that  these  matrices  Aj[ufjJ  commute  with 
each  ocher.  It  is  shown  by  finding  the  mean- 
squared  error  of  the  approximation  and  taking  Its 
limit  as  u  tends  to  zero  that  the  approximate  model 
for  Xj^(t)  during  the  fast  transitions  is  as  follows 

Xj(t)  »  xx ( t )  +  w(t)  (A) 

where  Xj  is  the  statistical  average  value  of 
AjtujjJ.  Hence  during  the  fast  transition  inter¬ 
vals  the  mean-squared  error  between  x^  and  xj  Is 
0(u)  and  tends  to  zero  as  p  tends  to  zero. 

The  resulting  approximation  implies  that  the 
slow  process  can  be  approximated  by  a  switched 
parameter  process  depending  on  the  aggregated  slow 
process  u(t)  that  has  Hj+1  slow  states  with  given 
by  usl,  i  »  1,2, . . . .Mj+1,  and  where  usi  c  for 
i  i  Hj,  and  where  the  value  of  Aj  as  a  function  of 
the  last  state  (the  aggregated  state  of  the  fast 
states)  is  equal  to  ij. 

It  is  important  to  note  that  the  proof  is 
derived  by  showing  a  mean-squared  limiting  proce¬ 
dure  and  not  a  weaker  form  of  convergence.  The 
derivation  is  based  on  finding  a  set  of  differen¬ 
tial  equations  for  the  conditional  means  of  the 
various  terms  of  the  mean-squared  error,  condition¬ 
al  on  the  last  value  of  the  state  of  the  Markov 
chain.  The  differential  equations  become  a  set  of 
singularly  perturbed  differential  equations  due  to 
the  dependence  on  P22(t/u).  The  resulting  expec¬ 
tations  therefore  tend  to  zero  as  u  tends  to  zero. 
The  proof  also  allows  the  derivation  of  first  order 
approximations  to  the  error  for  nonzero  u. 

CONCLUDING  REMARKS 

This  note  considered  the  limiting  behavior  of 
a  singularly  perturbed  stochastic  linear  system 
with  switched  parameters  which  depend  on  a  sin¬ 
gularly  perturbed  finite  state  Markov  procass.  The 


two  main  approximations  that  become  possible  with 
such  a  model  are:  The  fast  continuous  process  can 
be  approximated  by  a  white  noise  with  random  covar¬ 
iance  that  switches  according  to  the  slow  states  of 
the  Markov  process.  The  slow  continuous  process 
can  be  approximated  by  adding  an  additional  state 
to  the  slow  Markov  process  that  yields  an  average 
value  of  the  system  matrix  over  all  their  values 
based  on  the  fast  states.  The  results  depend  on 
two  crucial  assumptions,  the  ergodicity  of  the 
Markov  chain,  and  the  fact  that  the  values  of  the 
system  matrix  cosxnute.  The  results  allow  the 
derivation  of  higher  order  correcting  terms  to  the 
approximations. 

It  is  crucial  to  relax  soma  of  the  restric¬ 
tions  imposed  by  the  proofs  derived  in  this  case. 
Also,  the  case  when  the  time-scales  of  the  fast 
Markov  process  are  different  from  the  time-scales 
of  the  fast  subsystem  need  further  study.  The 
motivation  for  studying  this  problem  is  in  its 
potential  for  deriving  simplified  filtering  schemes 
for  switched  parameter  systems.  Typically  these 
schemes  require  expanding  memory  as  more  observa¬ 
tion  samples  are  taken.  Such  aggregations  and 
approximations  derived  in  this  note  may  be  helpful 
in  obtaining  approximate  implementable  schemes. 
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ABSTRACT 

The  effect  of  quantized  control  on  an 
otherwise  linear  singularly  perturbed  systaa  is 
analyzed.  Three  cases  are  studied:  open  loop 

control,  closed  loop  control  with  small  quantiza¬ 
tion  step  size,  and  closed  loop  control  with 
large  quantization  step  size.  The  results  of  a 
numerical  example  are  also  given  to  demonstrate 
the  analytical  techniques  described  in  the  paper. 

1.  INTRODUCTION 

This  work  examines  the  effect  of  quantiza¬ 
tion  on  a  singularly  perturbed  linear  control 
system.  Singular  perturbation  theory  is  often 
used  for  dynamic  systems  possessing  both  slow  and 
fast  dynamics  to  simplify  Che  analysis  and 
control  design  (1,2].  One  of  the  requirements 
under  which  a  dynamic  system  may  be  separated 
inco  slow  and  fast  models  using  standard  singular 
perturbation  techniques  is  that  the  system  Is 
smooth  with  respect  to  its  variables  including 
the  control  input  and  continuous  in  time  [1,3]. 
However,  in  many  applications  the  actuators 
controlling  the  system  have  discrete  states 
supplying  piecewlse-constant  or  quantized  control 
so  that  the  smoothness  requirement  is  not  met. 
This  type  of  actuator  includes  stepper  motors  and 
certain  types  of  hydraulic  and  pneumatic  devices 
[4,5].  Quantization  may  also  occur  as  e  result 
of  a  digital  controller  quantizing  either  the 
measurements  or  the  signal  to  the  actuator.  As 
shown  in  this  paper,  these  systems  may  still  be 
separated  into  slow  and  fast  models  using  the 
techniques  described  here. 

The  system  under  conslderstlon  is  linear, 
time- invariant  and  is  represented  by 


O  .  r  In 
n  L  a22'1a21 


(5) 


where  In  and  Im  are  nxn  and  mxm  identity  matri¬ 
ces,  'respectively.  the  natural  modes  of  (3) 
correspond  to  the  slow  modes  of  the  original 
systaa  and  the  natural  modes  of  (4)  correspond  to 
the  fast  modes  of  the  original  system. 


If  a  stretched  time-scale  x»(t-tg)/u  is 
defined,  (4)  may  be  expressed  as 


Ajn(x)  +  B2U;  n(0)-no 
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where  n(x)  denotes  the  transformed  function 
n(ux+tg)«n(t) .  In  this  equation,  n(x)  is 
composed  of  a  transient  due  to  the  initial 
conditions  and  may  have  a  steady-state  value  with 
respect  to  x.  This  fast  transient  in  x  is  known 
as  the  boundary  layer  solution  and  its  contribu¬ 
tion  to  n(t)  is  only  significant  for  t  in  a  short 
interval  ( tg » )  known  as  the  boundary  layer. 
Also,  note  that  (  remains  relatively  constant 
with  respect  to  x  so  that  E(x)»t0  Ul*  Without 
loss  of  generality  the  paper  considers  decoupled 
systems  of  the  form  given  in  (3),  (4),  or  (6). 

The  following  is  an  outline  of  the  paper. 
Section  2  discusses  the  affect  of  open  loop 
piecewlse-constant  control  input  on  the  decoupled 
system.  The  effect  of  closed  loop  quantized 
control  on  (3)-(4)  is  shown  in  Section  3  for 
cases  involving  both  small  and  large  quantization 
steps.  The  latter  case  is  demonstrated  via  a 
numerical  example  in  Section  4  involving  systems 
(l)-(2).  Section  5  concludes  the  paper. 


2.  OPEN  LOOP  QUANTIZED  CONTROL 


x(t)  ■  Altx(t)  +  Al2z(t)  +  Bju(t)  (1) 

uz(t)  ■  A2ix(t)  +  *22^(0  +  b2u^c^  (2) 

where  u  >  0  is  small  (I).  The  quantized  control 
input,  u(t),  may  represent  an  open  loop  command 
signal  or  a  closed  loop  feedback  signal.  Both 
cases  are  analyzed  in  this  paper. 

If  *22  is  Invertible,  the  system  in  (l)-(2) 
may  be  decoupled  using  the  transformation  cited 
in  [6]  with  the  resulting  form, 

C(t)  •  A0?  ♦  B0ui  C(t0)-Cjj  (3) 

uh(t)  ■  *2n  +  B2U;  n(tg)»nQ  (4) 

where  A0“*n*^i2A22  ^A2l*  ®0"®l"A12A22" ^®2 

a2“a22'  The  new  variables  £  and  n  are  expressed 
up  to  an  error  of  order  0(u)  by 


This  section  considers  singularly  perturbed 
linear  systems  excited  by  an  open  loop  piecewise 
constant  control  input.  This  control  input  is 
essuaed  to  be  separable  into  a  slow  switching 
component,  us,  and  s  fast  switching  component, 
uj .  The  slow  and  fast  modes  of  the  system  are 
essuaed  to  be  decoupled  as  in  (3)-(4)  with  input 
u  •  u.  ♦  uj .  The  magnitude  of  the  input  is 
restricted  to  be  in  e  finite  class  of  allowable 
levels,  l.e.,  quantization  levels.  Any  change  in 
the  input  is  characterized  by  a  discontinuity 
with  minimum  height  equal  to  the  smallest 
quantization  step. 

The  decomposition  of  u  may  be  defined  as 
follows 

u,U)*u(t)*constant  for  tc[ti(ti+1],  such 
that  the  switching  interval  is 
large  relative  to  u, 

Uf ( t )“u( t)-Ug(t). 


The  response  of  the  decoupled  system  (3)-(4) 
is  examined  due  to  tha  switching  of  u#  and  Uf. 
It  is  found  that  any  changa  to  us  excites  tha 
fast  modas  of  (4)  raquiring  analysis  of  tha 
boundary  layar.  Furthermore,  control  of  tha  fast 
modas  in  tha  boundary  layar  requires  application 
of  tha  uf  control  lmmedistely  following  a  switch 
in  us.  If  ~f  raprasants  tha  staady  stata 
component  of  Uf  (with  respect _to  tha  fast  tiae- 
scala),  than  tha  control  (u#-uf)  has  0(u)  affect 
on  tha  slow  variable  C(tJ.  Tha  Uf  control 
affects  C(t)  in  tha  same  manner  as  us.  (Actual¬ 
ly,  the  original  u  may  bo  decomposed  so  that  u( 
contains  u<r  and  uf  has  no  staady  stata  value). 
Soma  details  of  this  analysis  are  given  bolow. 

The  response  of  n(t)  and  ((t)  dua  to  a 
switch  in  us  is  examined  first.  let  u(  switch 
from  us^  to  us2  at  t^  and  define  tha  expanded 
time  variable  x  ■  (t-tj)/u.  Equation  (6)  becomes 

•  A2n(x)+B2Uj2»  n(0)«-A2*^B2u,i .  (7) 

In  the  boundary  layar  5(x)»C(0)+O(u).  Tha 

solution  to  (7)  is  given  by 

n(i)  -  *f(x,0)n(0)+(«f(tf0)-I)A2*lB2Us2  (8) 

where  ♦f('>*lWxp[*2(x*xl)l .  Note  that  if  A2 
a  stable  matrix  then, 

lim  n(x)  ■  *A2*1b2us2  ■  lira^t) 

for  t  outside  of  tha  boundary  layar  (l.o.  for 
tc(ti+d.t2l.  where  d>0).  Tha  response  of  Q(t) 
for  tc(t1-t-6,t2l  can  be  expressed  as 

?(t)  -  ♦s(t.t1)f(t1)+(*J(t,t1)-I)A0'lB0u,2  (») 

where  *s(t.ti)  -  explAgCt-t^)) . 

If  the  fast  system  is  to  be  controlled  in 
the  boundary  layerjti.tj+d],  uf(x)  must  ba  added 
in  equation  (7)  (uf  is  assumed  to  ba  taro).  Tha 
resulting  system  becomes 

-  A2n(x)+B2(us2+uf(t))!  (lQ) 

n(0)  •  -A2  ^B2Usj. 

Assuming  uf(x)  Is  bounded,  C(x)  ■  t(0)+O(u).  Let 

xi  and  ua ^  denote  tha  switching  times  and  tha 
corresponding  values,  Uf(xf),  of  Uf(x),  respec¬ 
tively.  Then  the  solution  to  (10)  becomes 

n(x)  -  af(x.0)n(0)  +  [^(x.OM^'^u^ 
+lIot*f(T>Tl^**f^<xl+l>lA2'lB2ufi>  ^n 

where  n  is  the  number  of  switches  and  Ufn«0. 
Again  note  that  ntxl-Aj'^u.j.  Th#  of 

Uf(x)  on  £(t)  is  0(u)  aa  can  be  teen  from  the 
solution  of  (3) 


l(t)  -  ♦g(t.t1>at1)+|0s(t,t1)-I)A0*1B0us2 

+  (12) 

A  time-varying  version  of  the  system  in  (3)- 
(4)  with  open  loop  piecewise  constant  control  may 
ba  similarly  analyzed.  Tha  constraint  on  the 
system  is  that  lA^I.  IB^I,  IA*0I,  IB0I  are 
bounded . 

Therefore,  linear  systems  exhibiting  a 
separation  into  fast  and  slow  dynamics  without 
control  input  also  exhibit  this  behavior  when  the 
input  is  piecewise  constant  and  separable  into 
fast  and  slow  components.  Such  systems  can  be 
reduced  into  a  slow  model  in  the  slow  time-scale 
t,  and  a  fast  model  in  the  fast  time-scale  x. 

3.  CLOSED  LOOP  QUANTIZED  CONTROL 
3.1  Small  Quantization  Steps 

Quantization  effects  in  the  closed  loop 
system  may  ba  represented  by  a  white  Causslan 
noise  input  if  the  quantization  steps  are  small 
(7).  Singularly  perturbed  systems  with  white 
noise  input  have  been  studied  previously,  see, 
for  example  (8,9).  This  work  differs  from  other 
papers  on  stochastic  singularly  perturbed  systems 
in  that  along  with  the  noise  input  which  is  white 
in  the  normal  time-scale,  t,  there  is  also  a 
noise  input  which  is  white  in  the  expanded  time 
scale,  x  (used  to  Implement  the  fast  controller). 

The  control,  u(t),  in  equations  (l)-(2)  for 
linear  stata  feedback  may  be  represented  by 

u  ■  *Ksx  *  KfZ  +  ws(t)  +  Wf ( x )  (13) 

where  w,  is  Causslan  white  noise  in  t  and  Wf  is 
white  noise  in  x.  These  noises  are  assumed  to  be 
independent  with  autocorrelations  given  by 

E(wJ|(tl)wJ(t2)}  ■  Qsd(tl-t2)  and 
E(wf(xl)wf(x2)>  •  Q{6(xl-X2) 

(The  independence  assumption  may  not  be  very 
accurate  and  merits  further  investigation).  The 
effects  of  w$  and  Wf  on  tha  mean  and  covariance 
of  n  and  (  are  examined  below. 

Substituting  (13)  into  ( 1 ) - ( 2 )  yields  the 
new  closed  loop  system 

x  •  (All-BiK1)x  +  (A^-BjKfJz  +  B^Wj+Wf)  (15) 

us  ■  (A2j*B2Kj)x  +  (A22*B2kf)s  +  B2(w9+wf)  (16) 

This  system  may  then  be  decoupled  using  the 
transformation  mentioned  in  Section  1.  The 
resulting  system  will  be  as  in  (3)-(4)  with  input 
u  ■  (ws+Vf)  and  system  matrices  given  by 

A0  "  All*BlKs  * 

(Al2*BiKf  )(A22*B2*(f  )"^(A2l"B2^s^B2 
B0  ■  Bl"^12^A22*B2^f )"*B2 
a2 


(17) 


In  order  to  examine  the  variance  of  n.  the 
time  scale  is  expanded  by  defining  x«(t-tg)/u  and 
(6)  is  rewritten  using  the  definitions  in  (17) 

.  A2n(x)  +  B2(«s+w{);  n(0)»no  (18) 

Since  ws  and  vf  are  independent,  the  variance  of 
n(x),  Pn(t),  consists  of  a  component  due  to  w5, 
Pns(x),  and  a  component  due  to  wj,  Pnf(x),  i.e., 

pn  "  pns  +  pnf  (19) 

Pn£  is  the  solution  of  the  following  Liapunov 
equation  [10) 

£Lif  *  A2pnf  +  pnf a2 '  +  B2Qf B2 '  (20) 

dx 

Note  that  if  A2  is  asymptotically  stable,  Pnf 
approaches  a  steady  state  limit,  P,^  as  x-».  The 
variance  Pns  has  been  shown  previously  to  be 
0(l/u)  (8).  In  fact,  as-  u-*0  n  approaches  a  white 
noise  process.  Clearly,  P„t  dominates  Pnf  for 
small  u;  therefore,  Pn(t)*PnJ(x). 

Similarly,  the  variance  of  C(t),  P^(t), 

consists  of  a  component  due  to  w#,  Pgs(t),  and  a 
component  due  to  Uf,  P^f(t).  Prs  is  easily  found 
to  be  the  solution  of  the  following  Liapunov 
equation  [10] 

*  A2pEs  +  p5sa2'  +  BlQSBl'  (2D 

In  order  to  find  P^f,  the  solution  to  (3)  ii 
required  with  input  Wf 

e(t)  -  *s(t.t0)5(t0)  +  (22) 

|  '»s(t.o)B1wf^2^0j  do 
c0 

Performing  a  change  of  variables  in  the  Integral 
and  post-multiplying  by  £(t)‘  and  tailing  the 
expected  value  yields 

prf(t)  »  *s(t,t0)Pe  (t0)«s'(t.t0)  + 

,  9  (23) 

ui  *s(t>uA+to)BjQfB^bs'  (t.uA+tg)  dA 
0J 

where  0«(t-tg)/u.  Expanding  this  equation  in  a 
Taylor  series  about  y-0  yields 

P5f(t)  -  «f(t>t0>PC<C0>V(c>t0>  +  0(u).  (24) 

Therefore,  the  effect  of  wj(x)  on  the  covariance 
of  5(t)  is  0(u)  and  P^(t)*Pj#(t). 

Evidently,  the  variance  of  both  the  fast  and 
the  slow  time  variables  are  dominated  by  the 
white  noise  in  t.  Furthermore,  as  u-»0,  the  white 
noise  in  x  has  negligible  effect  on  either  the 
slow  or  the  fast  systems.  Therefore,  standard 
singular  perturbation  techniques  for  stochastic 
systems  may  be  applied  to  this  problem  if  the 
quantization  steps  are  small. 


3.2  Large  Quantization  Steps 

When  the  quantization  steps  are  large 
compared  to  the  feedback  signal,  the  stochastic 
error  model  described-  in  Section  3.1  does  not 
apply.  Also,  the.  usual  perturbation  series 
expansion  techniques  are  not  applicable  to  this 
system  due  to  the  discontinuity  in  the  quantifier 
function  [3].  However,  the  system  in  equations 
(1)*(2)  can  still  be  separated  into  fast  and  slow 
subsystems  with  0(u)  errors  in  the  variables. 
The  analysis  for  this  separation  is  carried  out 
for  the  decoupled  system  in  equations  (3)-(4) 
with  a  scalor  input  and  can  suitably  be  applied 
to  the  original  system. 

In  the  case  of  scalor  quantized  feedback, 
the  control  input  u  in  (3)-(4)  can  be  written 
u«Q(-K^Q-K2n)  where  lit  and  K2  are  row  vectors 
representing  gain  ana  Q(-)  is  a  quantizer 
function  defined  as  follows 

Q(x)  -  cit  S  *  <  di+1  (25) 

for  i  •  1,  2,  ....  n. 

It  is  assumed  that  c^,  d^  are  defined  such  that 
di“-»,  dn+j«+«,  and  c^<cj+j,  d^td^.  An  example 
of  this  function  is  given  in  Fig.  1. 

The  fast  system  is  written  in  the  form  of 
aquation  (6)  where  u  ■  Q(-K1£g-K2n(x))  assuming 
that  E(x)*£0  A®  constant  with  respect  to  x.  The 
dependence  of  Q( • )  on  £ q  may  be  removed  by 
defining  the  following  modified  quantizer 
function  Q'(x)  *  Q(-K1Eg-x) 

Q'(x)  -  ci(  -(di+lClE0)2  x  >-(di+l+Kl£0)  (26) 

for  i  »  1 ,  2;  . . . ,  n. 

Since  u  can  take  on  only  a  finite  number  of 
values,  this  system  is  actually  piecewise  linear. 
The  phase  space  corresponding  to  this  system  is 
partitioned  into  regions  associated  with  each 
possible  value  c4  of  u  as  seen  from  the  defini¬ 
tion  of  Q'(K2n).  The  boundaries  of  these  regions 
are  defined  by  the  solutions  n^  of  the  following 
set  of  expressions: 

U2n  -  -(di  *  K^q);  1-1,2 . n)  (27) 

Regional  equilibrium  po.ints  are  defined  to 
be  hj«- A2")b2c^  so  that  n^  governs  the  behavior 

of  n(x)  inside  the  i^  region.  Note  that  if  a 
regional  equilibrium  point  does  not  lie  in  its 
associated  region,  it  is  not  a  global  equilibrium 
point.  Also,'  the  assumption  thac  A2  is  stable 
means  that  there  may  exist  a  global  equilibrium 
point  on  one  of  the  boundaries.  Inspection  of 
the  function  Q'(K2n)  shows  that  only  one  possible 
such  equilibrium  exists  for  a  given  value  of  £q. 

As  x—,  n  approaches  its  global  equilibrium 
point.  Let  ns  be  the  equilibrium  point  and 
define 

nf(x)  »  n(x)  -  ns.  (28) 
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In  the  normal  time  scale,  t,  Of  is  seen  to  decay  directly  due  to  time  stop  considerations, 
rapidly  from  its  initial  condition  to  zero  in  the 

initial  boundary  layer.  As  the  slow  variable  4.  numerical  EXAMPLE 

£(t)  begins  to  decay,  the  "equilibrium  point"  ns 

of  the  fast  system  changes  values.  In  fact,  ns  An  example  of  the  separation  method  dis¬ 

can  be  written  entirely  as  a  continuous  function,  cussed  in  Section  3.2  is  demonstrated  here.  The 
ns  ■  f(K1C(t)),  where  f(x)  is  as  shown  system  is  given  by  equations  (l)-(2)  where  111) 


i)  f(x)  -  niP  if  *di-K2ni2  x  >*di+i*K2ni 

for  soma  i, 

ii)  f(x)  ■  m(x+di+1+K2ni)  +  n^,  if 


(29) 


d^+j-K2n^2  x  >”<f i+i"^2^i+l  some  1, 


where  m  »  (r’i'ni+l)/lK2^ni+l’ni^  1 
Part  i)  corresponds  to  a  regional  equilibrium 
lying  inside  its  associated  region  and  part  ii) 
corresponds  to  an  equilibrium  point  lying  on  one 
of  the  boundaries.  An  example  of  this  function 
is  shown  in  Fig.  2  for  a  second  order  system. 
The  flat  portions  of  the  graphs  correspond  to 
part  i)  of  the  function.  The  sloped  lines  are  a 
result  of  simple  interpolation  between  the  flat 
parts  and  correspond  to  part  ii).  Note  that  the 
slope  of  these  lines  is  roughly  Inversely 
proportional  to  K2.  this  slops  is  less  than 

0(l/u)  in  order  of  magnitude,  the  variation  of  n3 
outside  the  boundary  layer  does  not  excite  n;  by 
more  than  0(u). 

Therefore,  since  the  fast  variable  satisfies 
n(t)  -  nf(r)  +  ns(t)  +  0(u).  (30) 

it  is  approximated  by  nf(i)+ns(tg)  in  the 

boundary  layer  and  by  ns(t)  otherwise.  The  slow 
variable,  is  approximated  up  to  0(u)  error  by 

^  •  AqE  +  Bgu;  C(tg)»ig  (31) 

u  -  Qf-KjC-l^n,) 
where  ns(t)  *  f(Kj{(t)). 

This  procedure  can  be  extended  to  the 
multiple  input  case.  However,  the  complexity  of 
finding  the  equilibrium  points  increases. 

A  remark  can  be  made  about  the  comparison 
between  analysis  by  time  integration  of  the 

actual  system  (3)-(4)  and  by  time  integration  of 
the  reduced  systems  (6)  (where  u"Q(*KiCg*K2n(r))) 
and  (31).  The  usual  benefits  or  using  the 
reduced  models  Include  integrating  lower  order 
systems  and  using  a  larger  time  step  for  the  slow 
model.  A  peculiarity  of  this  system  is  that  Uf, 

does  not  necessarily  go  to  zero  as  n;  goes  to 

zero.  This  is  a  consequence  of  an  equilibrium 
point  lying  on  the  boundary  between  regions  in 
the  phase  space.  If  n  spirals  into  such  an 
equilibrium  point,  it  keeps  crossing  into 
different  regions  characterized  by  u  switching 
between  two  values.  In  other  words,  ng  is  driven 
to  zero  by  u{  which  must  keep  switching  to 
maintain  the  zero  value  of  rif.  In  the  reduced 
models,  the  boundary  layer  need  only  be  evaluated 
until  n;  decays  sufficiently  regardless  of  uf. 
The  slow  model  does  not  depend  on  uj.  However, 
the  constant  fast  switching  of  Uf  may  causa 
numerical  problems  in  time-integrating  (3)-(4) 


Ks  -  (1  0.89202]  Kf  -  (0.24396  0.061996] 
x(0)  -  z(0)  -  [2.5  0]'  (32) 


and  y  *  .1  and  u»Q( -K«x-Kfz) .  The  quantizer  is 
uniform  with  five  possible  states, 

(  -2  x  <  -1.5 

/  -1  -1.5  S  x  <  -.5 

Q(x)  -  (  0  -.5  S  x  <  .5  (33) 

\  1  .5  S  x  <  1.5 

\  2  1.5  S  x. 

Decoupling  the  system  using  the  transformation  of 
Section  1,  the  regional  equilibrium  points  are 
found  to  be 

nj  -  (-1.127  -2]'  n4  -  (.563  1]' 

n2  -  (-.563  -1]'  n5  -  [1.127  2]'  (34) 

nj  •  (0  01' 

and  the  Initial  equilibrium  ns(0)«ni.  The 
function  n.  ■  f(Kj()  is  shown  in  Fig.  2.  Note 
that  the  slope  for  ns}  is  approximately  2-8  and 
for  ns2  is  5.  This  is  less  than  10  -  1 / u  but  it 
is  large  enough  to  cause  excitation  of  the  fast 
variable  nj  as  seen  in  the  results. 

The  phase  plane  plot  of  nj(t)  versus  n2(t) 
found  by  integrating  the  actual  system  is  shown 
in  Fig.  3  along  with  the  initial  equilibrium 
point  nj(0).  The  trajectory  is  attracted  to 
n9(0),  then  follows  along  the  line  of  possible 
equilibrium  points  to  zero.  This  system  is 
approximated  by  two  systems:  a  fast  one  and  a 
slow  one.  The  fast  system  given  by  equation  (6) 
with  appropriate  parameter  matrices  starts  at  a 
value  of  n(0)-ns(0)  and  decays  to  zero  in  x  as 
n(x)-»n#(0).  This  is  superimposed  with  the  slow 
system  given  by  equation  (31)  Integrated  with 
respect  to  t,  so  that  n(t)"nf(r)+ns(t).  The 
coordinates  then  are  transformed  back  to  x  and  z 
for  comparison. 

Comparison  between  time  integration  of  the 
original  system  and  the  approximated  system  are 
shown  in  Fig.  4  to  Fig.  7.  Note  that  the  initial 
boundary  layer  is  tracked  very  accurately.  With 
the  exception  of  z2,  the  errors  between  models 
are  0(u).  The  fast  variable,  nj,  does  appear  to 
be  excited  in  both  zj  and  z2  near  the  transitions 
corresponding  to  n«(t)  moving  along  one  of  the 
slopes  in  Fig.  2.  The  CPU  time  required  for 
time-integration  of  the  actual  system  was  5-10 


vv%r*v^r  w  v  vw  •« 


•X»v  r,  ir ^  V".  r.  . 


times  longer  than  required  for  the  approximated 
system.  Also,  convergence  problems  occured  in 
the  actual  system  time  integration  while  none 
appeared  in  the  approximated  system.  As  p 
decreases,  numerical  evaluation  of  the  actual 
system  encounters  more  problems  while  the 
approximated  system  becomes  more  accurate. 


5.  SUMMARY 


The  effect  of  quantization  on  a  singularly 
perturbed  linear  system  is  discussed.  Three 
cases  of  quantized  control  are  studied:  open 
loop,  closed  loop  with  small  quantization  steps, 
and  closed  loop  with  large  quatization  steps. 
The  open  loop  quantized  control  Is  decomposed 
into  a  slow  switching  and  a  fast  switching 
component.  The  fast  switching  has  0(u)  effect  on 
Che  slow  subsystem,  yet  any  change  in  the  slow 
control  excites  the  fast  subsystem  and  requires 
evaluation  of  the  boundary  layer.  The  closed 
loop  system  with  small  quantization  steps  is 
evaluated  by  modelling  the  quantizer  errors  as 
white  noise  in  t  and  white  noise  in  x.  The  white 
noise  in  t  dominates  the  behavior  of  both  the 
slow  and  the  fast  subsystems.  Therefore,  the 
system  may  be  evaluated  using  standard  singular 
perturbation  theory  for  stochastic  systems.  A 
closed  loop  system  with  large  quantization  steps 
in  the  feedback  violates  the  smoothness  require¬ 
ment  of  standard  perturbation  methods,  so  a  new 
technique  for.  separating  this  system  into  slow 
and  fast  models  is  provided.  The  technique  is 
successfully  illustrated  via  a  numerical  example. 
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Abstract 

This  paper  analyzes  piecewise-linear  systems  which  are  singularly 
perturbed.  A  technique  is  developed  that  allows  decoupling  of  such  systems 
into  fast  and  slow  subsystems  for  analysis  and  design.  The  results  of  a 
numerical  example  are  included  to  demonstrate  this  technique. 
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1.  Introduction 

Systems  inherently  possessing  both  slow  and  fast  dynamics  are  found 
commonly  in  electrical,  mechanical  and  aerospace  applications.  These  types 
of  systems  are  numerically  very  stiff  and,  hence,  are  difficult  to  analyze. 
This  problem  may  be  alleviated  by  using  singular  perturbation  theory  to 
separate  the  system  into  reduced-order  models,  one  containing  the  slow 
dynamics  and  one  containing  the  fast  dynamics.  Reduced-order  models  are 
easier  to  use  in  analysis  and  design  by  lessening  computation  complexity. 
In  addition,  time- integration  of  the  lower  order  systems  instead  of  the 
full  order  model  reduces  computation  time  since  a  larger  time  step  can  be 
used  for  the  slow  dynamic  model.  Using  standard  singular  perturbation 
techniques,  however,  requires  that  the  system  dynamical  equations  be  smooth 
[1,2]  ruling  out  their  use  on  piecewise-linear  systems. 

This  paper  extends  the  general  method  of  singular  perturbation  for 
application  to  continuous  piecewise-linear  systems.  Such  systems  are  found 
in  electrical  circuits  and  in  flight  controls.  The  piecewise-linearity  may 
be  due  to  nonlinear  elements  such  as  saturation  or  may  result  from  a 
linearization  about  various  operating  points  of  a  nonlinear  plant.  These 
systems  may  exhibit  both  fast  and  slow  dynamics,  so  it  is  desirable  to 
separate  the  model  into  slow  and  fast  subsystems. 

Problem  formulation 

The  system  considered  in  this  paper  may  be  represented  in  the 


following  form: 


x  =  fa(x,z) 

MZ  *  f  j ( X , Z ) 


x(t0)  =  x0 
z(t0)  =  z  0 


where  fj  and  f3  are  continuous  piecewise-linear  functions,  where  p>0  is  a 


y.y.y.Y.v.v.v.v.  «\v 
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small  parameter  and  where  xeRP  and  zeRr.  The  functions  are  linear  in 
specific  regions  of  the  phase  space  (R^+r)  where  a  region  is  typically 
defined  as  an  intersection  of  halfspaces.  For  example,  equations  (1)  and 
(2)  are  represented  in  the  ifc^  region  by  the  following  "linear"  system: 


x  * 

An  x 

.  .  i  ,  i 

+  A12  z  +  wx 

(3) 

|JZ  = 

A2 1  x 

,  .  i  ,  i 
+  A22  z  +  w2 

(4) 

For  the 

purposes  of 

this 

. ,  .th 

paper,  the  l  region 

is  defined  by  the  set 

S^Kx.z) 

:  d.  .  <  K  x  + 

1-1  X 

Kzz  S 

d . }  where  K  and  K 

i  x  z 

are  row  vectors  and 

^i-1  <  ^i  are  sca^-ars*  By  this  definition,  the  type  of  regions  allowed  are 
parallel  in  that  the  boundaries  do  not  intersect.  The  reason  for  this 
restriction  will  be  discussed  in  Section  2. 


The  system  given  in  equations  (1)  and  (2)  contains  both  fast  and  slow 
dynamics.  The  variable  x  is  primarily  slow  while  z  has  both  fast  and  slow 
components.  Starting  from  the  initial  conditions  of  equations  (1)  and  (2), 
the  fast  part  of  z  quickly  dies  out  and  z  converges  to  a  quasi-steady-state 
value  (i.e.,  the  slow  component)  in  a  short  time  interval  [t0,t0+6)  known 
as  the  boundary  layer.  The  fast  component  of  z  is  then  known  as  the 
boundary  layer  solution.  The  solution  of  the  system  outside  of  the 
boundary  layer  is  termed  the  outer  solution.  It  is  desired  to  decouple 
system  ( 1 ) - ( 2)  into  fast  and  slow  models  which  yield  the  boundary  layer 
solution  and  the  outer  solution,  respectively.  The  boundary  layer  solution 
is  then  used  as  a  correction  term  to  the  outer  solution  so  that  the 
combination  is  an  approximation  for  the  original  system  with  errors  on  the 
order  of  0(p).  A  technique  to  decouple  the  system  is  developed  in  this 
paper. 

The  following  is  an  outline  of  the  paper.  Section  2  discusses  the 


this  solution.  The  outer  solution  along  with  a  corresponding  reduced- 
order  model  is  discussed  in  Section  3.  A  numerical  example  is  presented  in 
Section  4  to  demonstrate  the  techniques  developed  in  this  paper.  Section  5 
concludes  the  paper. 


2.  Boundary  Layer  Solution 

The  fast  dynamics  of  the  system  are  most  prominant  during  the  boundary 
layer  and  can  be  decoupled  from  the  slow  dynamics  by  introducing  an 
expanded  time  scale  x  *  (t-t0)/p.  Examination  of  equation  (1)  shows  that 
x  stays  relatively  constant  with  respect  to  x  assuming  that  A111,  A12x  and 
w^  are  bounded  in  all  regions  S^  [1].  Equation  (2)  may  be  rewritten  as 
follows: 

ft  -  <5> 

where  z(x)=z(ux-t0)  and  f(z)=f2(x0,z).  The  function  f  is  a  continuous 
piecewise-linear  mapping  from  Rr  into  Rr.  The  regions  where  the  function 
is  linear  are  found  by  projecting  the  regions  from  R^+r  of  the  original 
problem  into  the  manifold  xsx0.  For  example,  the  i*"*1  region  in  Rr  is  the 
set  Rj={z:  d^_^  <  Kxx0+Kzz  $  d.}.  A  degenerate  case  where  Kz~0  results  in 
the  existence  of  only  one  region  in  Rr  so  that  f  is  linear.  (Note  that  the 
equilibrium  point  of  the  degenerate  case  is  easily  found  assuming  that  the 
system  is  stable).  A  stable  equilibrium  point  is  the  initial  quasi-steady- 
state  value,  zs(t„),  of  z(t). 

The  equilibrium  point(s)  of  system  (5)  for  the  nondegenerate  case  can 
be  found  using  solution  techniques  developed  for  piecewise-linear  resistive 

networks.  Many  papers  have  been  written  on  finding  the  solution  x  of  the 
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equation  f(x)=y  where  f  is  a  continuous  piecewise-linear  function,  see  for 
example  [3-9].  Fujisawa  and  Kuh  show  in  [4]  that  a  continuous  piecewise- 
linear  function  satisfies  a  Lipshitz  condition.  The  following  theorem  from 

[4]  gives  sufficient  conditions  for  the  existence  and  uniqueness  of  the 
solution. 

Theorem  1:  Let  f  be  a  continuous  piecewise-linear  mapping  of  Rr  into 

itself  and  let  J1^  denote  the  matrix  composed  of  the  first  k  rows  and 
columns  of  the  Jacobian  matrix  J*  in  region  R^  The  mapping  is  a 
homeomorphism  of  Rr  onto  itself  if,  for  each  k=l ,2, . . . ,r,  the  determinants 
of  the  kxk  matrices 

'^1k»^2k»  •  •  •  *  Jrk 

do  not  vanish  and  have  the  same  sign. 

This  previous  work  is  used  in  finding  the  equilibrium  point(s)  of  system 

(5)  by  solving  ?(z)=0.  In  this  application,  J*  *  A22*  and  each  A22i  is 
assumed  to  be  Hurwitz  for  stability  purposes.  The  conditions  of  Theorem  1 
may  be  stringent  and  various  other  sufficient  conditions  for  the  existence 
and  uniqueness  of  the  solution  are  given  in  [9-11].  Also,  reference  [12] 
discusses  nonunique  solutions. 

2.1  Algorithm  to  Solve  for  Equilibrium  Point 

The  Katzenelson  algorithm  is  widely  used  in  solving  for  x  in  the 
equation 

f(x)  *  y  (6) 

where  f:Rr->Rr  is  continuous  and  piecewise-linear.  The  basic  outline  of 
this  algorithm  used  in  solving  £(z)*0  is  given  below.  More  details  of  the 
general  method  are  given  in  [4],  Let  W*=A2 ^Xq+w*  Vi,  and  denote  the 


L  3 


tj 

•2 

I? 


t 


Is 


fc 


iteration  number  on  zs,  y,  and  X  by  superscripts. 

0)  initialize  by  letting  zs=z0  and  i*l 

1)  solve  y*  =  A22J  zs*  +  WJ ,  where  region  Rj  contains  zs 


2)  solve  z  3  z-* 


-  (A22j)'V 


3)  if  z  lies  in  region  Rj  then  zs  =  z  and  stop 

4)  otherwise,  let  R^  be  the  region  containing  z; 


if  k>j  then  d  *  d^  and  then  let  j=j+l 


if  k<j  then  d  =  d^_^  and  then  let  j=j-l 

5)  solve  A1  =  (Kzzs+Kxx0-d)/K2(A22j)"1yi 

(assuming  that  the  denominator  is  not  zero) 


6)  solve  zsi+*  =  Zg^  +  AMA2 2 3 )  ^yi 


7)  let  i=*i+l  and  go  to  1) 

It  is  shown  in  [4]  that  if  the  piecewise-linear  function  is  a  homeomorphism 
(e.g.  it  satisfies  the  conditions  of  Theorem  1)  then  the  algorithm  will 
converge  in  a  finite  number  of  steps. 


2.2  Boundary  Layer  Approximation 

A  fast  model  approximating  the  dynamics  occurring  in  the  boundary 
layer  can  be  found  once  the  equilibrium  point  of  system  (5)  is  known.  The 
boundary  layer  solution  is  then  given  as  z ^ (x)=z(x)-zs(t0 ) .  In  many 
singular  perturbation  cases,  the  equilibrium  point  zs  can  be  written  as  an 
explicit  function  of  x  so  the  fast  model  is  typically  given  in  terms  of  Zf. 
In  this  application,  z_  must  be  found  implicitly.  Therefore,  the  fast 


model  approximating  the  boundary  layer  solution  is  given  in  terms  of  z.  In 


.  th 


the  i  region  the  fast  model  is  given  by 


dz  i  i-  i 

dx  "  A21  x0  +  A22  z  +  w2 


z(0)=»zc 


Zf(x)  -  z(x)  -  zs(t«) 

314 


“W  .A  -V  -A  -fl»  I’m.  -W  -  A 


(7) 


« 


."T/.VV.W  r 


3 
'* 
N 
* 
•ji 

'i1 

V 


where  the  1  region  is  defined  by  the  set  R^={z:  d^_^  <  Kxx0  +  Kzz  S  di>. 

For  the  purposes  of  this  paper,  it  is  assumed  that  there  exists 
exactly  one  equilibrium  point  which  is  asymptotically  stable.  Multiple 
stable  equilibrium  points  may  be  handled  by  partitioning  the  phase  space 
into  domains  of  attraction  for  the  various  equilibrium  points  and  the 
analysis  in  this  paper  holds  for  each  domain  of  attraction. 

Asymptotic  stability  is  assumed  in  this  system  though  there  is  no 
known  general  method  for  determining  asymptotic  stability  of  piecewise- 
linear  systems.  Depending  on  the  specific  system  under  consideration,  a 
Lyapunov  function  may  be  found.  Another  possibility  is  to  use  standard 
SISO  frequency  domain  techniques  or  hyperstability.  For  using 
hyperstability  notions,  system  (5)  may  be  rewritten  as 

—  ■  Az  +  Bu  ( 8 ) 
dx 

where  A  is  chosen  to  be  stable,  B  is  the  identity  I,  and  u  is  defined  in 
the  i*"*1  region  to  be  u=AA^z+A12^x0+w2^  where  AA^=A22^--A.  If  the 
nonlinearity  in  the  feedback  loop  satisfies  the  Popov  integral  inequali  ty, 
then  the  necessary  and  sufficient  condition  for  asymptotic  stability  is 
that  the  transfer  matrix  (sI-A)”^  must  be  strictly  positive  real  [13). 

The  errors  in  this  approximation  are  due  to  the  approximation  of  x  by 
x„,  introducing  errors  of  Q(y)  into  equation  (7)  and  into  the  definition  of 
the  regions.  Substituting  x  *  x0  +  0(u)  into  (7)  and  into  yields  the 
system 

■  A21i(x0+O(p))  +  A22iz  +  v2i  z(0)*zo  (9) 

Ri  -  {z:  di_1+0(y)  <  Kxx0+Kzz  £  d.+0(y)} 
where  z  represents  the  actual  response.  In  the  interior  of  any  particular 


mWi 


TOW 


ycf  &&&  *■ 


■4, 


I'U  t, 


region,  both  the  approximation  and  the  actual  model  are  linear.  Previous 
results  on  singular  perturbation  theory  in  linear  systems  show  that  if 
z(x' )*z(x' )+0(u)  then  z(x")“z(x")+0(m)  for  x">x'  as  long  as  both  z  and  z 
stay  within  the  region.  The  problems  that  may  arise  due  to  a  boundary 
crossing  are  eliminated  if  the  class  of  systems  allowed  is  restricted  to 
those  in  which  the  vector  field  intersects  a  boundary  hyperplane  at  a  large 
enough  angle  (i.e.  0(u°)).  In  these  systems  if  either  z  or  z  crosses  into 
another  region,  the  other  must  also  cross  into  that  region.  The  resulting 
error  in  the  approximation  remains  on  the  order  of  0(y).  These  conditions 
are  summarized  in  the  following  theorems.  Note  that  the  restriction  placed 
on  the  class  of  systems  is  sufficient  and  not  necessary  for  proving  that 
the  approximation  is  on  the  order  of  0(u). 


Theorem  2:  Let  the  vector  field  near  a  boundary  at  di*Kzz+Kxx0+O(u)  be 


given  by 


f(z)  ■  Aji1  (x0+O(m))  +  Ajj1  z  + 


Assume  that  f(z)  does  not  vanish  near  the  boundary.  If  f(z)  intersects  the 
boundary  with  an  angle  of  order  0(p°),  then  the  difference  between  the 
solutions  of  (7)  and  (9)  is  0(m)- 

Proof:  Assume  z  crosses  the  d^  boundary  at  x'  and  z  has  not  crossed  any 
boundary.  Prior  to  crossing  z*z+0(u).  The  normal  vector  of  the  boundary 
hyperplane  is  given  by  n»KzT/llKz8.  Since  f(z)*n-0(u°),  then 

Kz^A2ii  (x0+0(y))  +  A221  z  +  Wj*)  -  0(m°).  (11) 


It  follows  that 


Kz(A2i1  x0  +  k21i  z  +  Wjl)  =  0(m°) 


Define  s  and  5  by 


wn 


(13) 

(14) 


KJ  -  di' 


i  =  Kzz  -  d^'  +  0(m) 


where  1 =d^-Kxx0 .  Assume  s,s>0.  For  z  to  cross  the  boundary, 


^  <  0  where  ^  is  given  by  expression  (11).  Correspondingly,  ^  <  0 


ds 


where  —  is  given  by  expression  (12).  At  the  boundary  crossing,  s~0  so 


that  Kzz-di'=0(u) .  It  follows  that  s(x')  =  Kzz  -  d^'  +  0(p)  =  0(m). 


Since  77  =  0(u°)  then  7^  =  0(m°).  Hence,  Ax=0(p)  since  As=s(x' )=0(p) . 
ax  Ax 


(15) 


Therefore,  if  z  crosses  a  boundary  into  a  new  region  at  x',  then  z  must 
also  cross  into  the  same  region  at  a  time  x"  such  that  xn=x'+0(u). 

It  remains  to  be  shown  that  the  time  difference  of  0(u)  in  the  boundary 
crossing  has  0(u)  effect  on  the  solution.  Let  A  =  A22^  and  AA  =  A22J  *  A 
where  Rj  is  the  new  region  and  xa<x'  be  such  that  both  z(x„)  and  z(c0)  lie 
in  region  R^  Then  the  solution  of  (7)  for  x>x'  is 

z(x)  =  4>(x,x0)z(x0)  +  [  *(x,o)(AAz+A21^x0+w2^)  do 

Jx' 

+  [  *(x,o)(A21ix„+w21)  do 

J  xo 

where  <t(x,x')  =  exp[A(x-x 1 ) ] .  Since  the  integrands  are  bounded  in  both 

integrals  and  x"-x'=Q(u),  equation  (15)  is  rewritten  as 

z(x)  =  4>(x,x0)z(x0)  +  J  )4>(x,o)(AAz+A21-’x0+w2^)  do 

rx  * 1 

+  4>(x,o)(A211x0+w21)  do  +  0(u) 

Jxo 

Similarly,  the  solution  to  equation  (9)  is  found  to  be 

z(x)  *  4>(x,x0)z(x0)  +  J  )'J>(x,o)(AAz+A21^x0+w2^)  do 

prM  • 

+  j  $(x,o)(A21lx0+w2 x)  do  +  0(m) 

xo 

Hence,  z(x)*z(x)+0(m) • 


(16) 


(17) 
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Theorem  3:  Let  the  vector  field  near  a  boundary  at  d^=Kzz+Kxx0  be  given  by 

f(z)  *  k2ii  Xjj  +  A22i  z  +  w^  .  (18) 
Assume  that  f(z)  does  not  vanish  near  the  boundary.  If  f(z)  intersects  the 
boundary  with  an  angle  of  order  0(p°)»  then  the  difference  between  the 
solutions  of  (7)  and  (9)  is  0(u). 

Proof:  The  proof  is  very  similar  to  that  of  Theorem  2.  The  gist  of  the 
proof  is  to  show  that  if  z  crosses  the  boundary  prior  to  a  crossing  of  z, 
then  z  must  cross  within  a  time  of  0(p).  The  time  delay  in  crossing 
affects  the  error  in  the  approximation  only  by  0(m). 

Using  the  results  of  Theorems  2  and  3  it  is  seen  that  the  errors  in 
the  approximation  are  on  the  order  of  0(p).  The  restiction  given  in 
Section  1  that  the  regions  of  linearity  be  parallel  is  used  to  avoid  the 
problem  of  "corners".  It  is  not  known  at  this  time  how  good  the 
approximation  is  if  either  £  or  z  crosses  a  boundary  at  the  intersection  of 
two  boundary  hyperplanes. 

3.  Outer  Solution 

A  reduced-order  model  for  system  ( 1 ) - ( 2 )  is  developed  below  with 
approximation  errors  on  the  order  of  0(p)  for  the  time  outside  of  the 
boundary  layer.  Assuming  that  the  fast  subsystem  given  in  equation  (7)  is 
asymptotically  stable  to  its  equilibrium  point,  the  fast  component  of  z  is 
negligible  outside  of  the  boundary  layer.  Therefore,  the  variables  of  the 
reduced-order  slow  model  are  x  and  the  quasi-steady-state  value  zs  of  z. 
zs  is  the  equilibrium  point  of  system  (7)  where  x„  is  replaced  with  x. 
Hence,  the  quasi-steady-state  value  of  z  is  a  continuous  implicit  function 
of  x.  (Continuity  is  shown  below. )  zs  can  be  determined  using  the 


Katzenelson  algorithm  described  in  Section  2.1  where  the  current  value  of  x 
is  substituted  for  x0  and  the  algorithm  is  initialized  with  zsx  equaling 
the  previous  value  of  zs.  Due  to  continuity,  a  small  change  in  x  results 
in  a  small  change  in  zs  so  that  in  almost  all  cases  while  time- integrating 
the  system  only  steps  l)-3)  are  needed  to  find  a  new  zs.  Continuity  of  zs 
as  a  function  of  x  is  shown  in  the  proof  of  the  following  theorem. 

Theorem  A:  Let  f:Rr-»Rr  be  a  continuous  piecewise-linear  mapping  defined  in 
the  i1"^1  region  by 

f(z)  =  A21*  x  +  A22i  z  +  w2i  (19) 
If  f  is  homogeneous  then  the  equilibrium  point  zs  of  (20)  is  given  by  a 
continuous  function  of  x. 

Proof:  Since  f  is  homogeneous,  a  unique  solution  for  zs  exists  for  any  x. 
Let  xx  be  given  resulting  in  zs*zs>1.  Let  denote  the  region  of 
(xuzSi  x)  in  RP+r. 

Suppose  (xlfzSjl)  lies  in  the  interior  of  region  S^.  Then  zs>1  can  be 
written  as 

zs,i  *  "(A221)’1(A2i1  xi  +  w21)  (20) 
Choose  x2  close  to  x2  resulting  in  zs=zs>2  such  that  (x2,zSj2)  lies  in 
region  S^.  Hence, 

Zs,2  *  "(A22i)"1(A211  X2  +  W21)  (21 ) 
It  is  clear  that  zs  is  a  continuous  function  of  x  at  x2  supposing  that 
there  exists  a  6>0  such  that  (x,zs)  lies  in  region  S^  for  all  x  such  that 
lx1-xll<5.  For  the  region  defined  by  the  set  {(x,zs):  d.^ <  Kxx+K-zzs  £ 
d^}  a  <5  is  given  by 


where  M  =  Kx  -  Kz(A22*)  1A21i. 

Therefore,  zs  is  a  continuous  function  of  x  for  all  x  such  that  (x,zs)  lies 
in  the  interior  of  a  region. 

Suppose  x2  is  given  so  that  (x1,zs>1)  lies  on  a  boundary,  say 

^i^KjjXi+KzZg,  2 . 

Choose  x2  close  to  x1  resulting  in  zs=zs>2.  If  (x2,zs>2)  lies  in  region  SA 
then  the  above  analysis  is  applied  and  zs  is  considered  to  be  continuous 
from  the  closed  half-plane  in  region  S^.  If  (x2,zs>2)  lies  in  region  Si+^, 
then 

zs,2  *  -(A22i+1)_1(A21i+1  x2  +  w2i+1)  (22) 

A  consequence  of  the  continuity  of  f  is  that 

-(A22i)_1(A21i  Xl  +  w2i)  +  (A22i+l)_1(A21i+1  xx  +  w.i+l)  =  0  (23) 

Adding  equation  (23)  to  equation  (22),  subtracting  the  result  from  (20)  and 
taking  the  norm  of  both  sides  yields: 

llzs>1-zSj2ll  =  H(A22i+1)'1A21i+1(x2-x1)l  $  8(A22i+1)'1A21i+1llllx2-x1ll 
Hence,  zs  satisfies  a  Lipshitz  condition  in  the  open  half -plane  in  region 
Si+^.  Therefore,  zs  is  continuous  for  x  such  that  (x,zs)  lies  on  a 
boundary  hyperplane.  Thus,  zs  is  a  continuous  function  of  x. 

The  reduced-order  slow  model  of  system  ( 1 ) - ( 2 )  for  t  outside  of  the 
boundary  layer,  i.e.  t>t0+6,  is  given  as  follows: 

Xg  =  A1X  xs  +  A22  Zg  +  Wj  x g ( t  q )— Xg  (24) 

where  zs  is  an  implicit  function  of  x  and  is  found  using  the  Katzenelson 
algorithm. 

The  error  in  the  approximation  is  due  entirely  to  the  fact  that 
z=»zs+0(ij).  This  error  is  analogous  to  the  error  of  approximating  x  by  x0 


in  the  boundary  layer  solution.  Therefore,  the  effect  of  the  error  can  be 
analyzed  similarly  as  in  Theorems  2  and  3  showing  that  the  errors  in  the 
solution  are  on  the  order  of  0(m). 


4.  Example 

The  techniques  previously  described  for  separating  a  piecewise-linear 
singularly  perturbed  system  are  demonstrated  on  the  example  below.  The 
model  represents  a  linear  system  with  a  saturation  nonlinearity  in  the 

feedback  loop.  Such  types  of  models  exist  in  both  flight  controls  and  in 

electrical  circuits.  The  system  is  given  by 

x  =  A31x  +  A12z  -  B3u  (25) 

uz  =  A21x  +  A22z  -  B2u  (26) 

{-1  if  Kxx+Kzz  <  -1 
^x+KjjZ  if  I^x+Kjpzl^l 
1  if  Kxx+Kzz  >  1 

where  u=.l.  The  parameter  matrices  are  given  as  follows: 


Kx  -  [1  0.861]  Kz  =  [1.220  0.310] 

The  initial  conditions  are  given  as  x(0)  =  z(0)  =  [2.  3.]'. 

This  system  is  put  into  the  piecewise-linear  form  by  substituting  the 
expressions  for  u  into  equations  (25)  and  (26).  The  three  regions  are 
defined  as  S1={(x,z):Kjcx+K2z<-1} ,  S2={ (x, z) : |Kxx+Kzz j S 1 }  and 

S3={ (x , z) :Kxx+Kzz>l } .  The  initial  condition  is  located  in  S3. 


The  reduced-order  models  given  in  the  form  of  equations  (7)  and  (24) 
are  used  in  finding  the  time  response.  Comparisons  between  these  results 
and  those  obtained  by  time-integrating  the  full  order  model  are  shown  in 


Figure  1  through  Figure  4.  Note  that  the  approximation  matches  the  actual 
response  very  closely,  i.e.  within  an  order  of  0(p).  The  computation  time 
for  the  approximation  was  roughly  half  as  long  as  for  the  actual  system. 
Furthermore,  as  the  value  of  u  decreases,  the  approximation  becomes  more 
accurate  and  the  relative  computation  time  decreases  due  to  the  numerical 
stiffness  in  the  actual  system. 

5 .  Summary 

A  singular  perturbation  technique  is  developed  in  this  paper  which 
allows  for  a  decoupling  of  a  continuous  piecewise-linear  system  into  slow 
and  fast  subsystems.  Under  the  assumption  of  asymptotic  stability,  the 
fast  variable  is  found  to  decay  in  the  boundary  layer  to  its  quasi-steady- 
state  solution.  This  quasi-steady-state  solution  is  given  by  a  continuous 
implicit  function  of  the  slow  variable.  The  solution  is  found  using  the 
finite  step  algorithm  given  in  the  paper.  Sufficient  conditions  for  the 
approximation  to  be  accurate  to  an  order  of  0(p)  are  given.  The  technique 
developed  is  successfully  illustrated  via  a  numerical  example. 
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ABSTRACT: 


In  this  paper  a  special  class  of  linear  piece-wise  constant  time-varying 
systems  is  introduced.  These  systems  are  called  hybrid  systems  because  the  set 
of  linear  time- invariant  systems  among  which  the  system  is  switching  is 
finite.  This  kind  of  model  can  be  used  to  represent  systems  subject  to  known 
abrupt  parameter  variations  such  as  commutated  networks  or  to  approximate 
certain  time-varying  systems.  Our  main  results  are:  a  necessary  and 
sufficient  algebraic  condition,  a  very  simple  algebraic  criterion,  and  a 
computationally  appealing  algebraic  sufficient  test  for  controllability  and 
observability.  Moreover  we  give  a  simple  sufficient  stability  condition. 
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1.  INTRODUCTION  AND  PROBLEM  FORMULATION: 

In  this  paper  we  examine  the  controllability,  observability  and  related 
issues  of  hybrid  systems.  Hybrid  systems  in  this  paper  are  a  special  class  of 
piece-wise  constant  time-varying  systems.  The  set  of  constant  realizations 
among  which  the  model  is  switching  is  finite.  Systems  of  this  type  can  be  used 
to  model  synchronously  switched  linear  systems  [1],  networks  with  periodically 
varying  switches  [2]  and  fault-prone  systems  [3].  Even  though  hybrid  systems 
are  time-varying  they  lend  themselves  to  a  precise  and  complete  qualitative 
and  quantitative  analysis.  Among  such  results  we  mention  the  possibility  to 
explicitly  compute  their  transition  matrices,  to  derive  and  state  necessary 
and  sufficient  conditions  to  test  for  their  stability,  and  most  interestingly 
the  possibility  to  derive  algebraic  controllability/observability  test  similar 
to  the  celebrated  one  found  in  the  theory  of  linear  time- invariant  systems. 
This  is  possible  because  of  the  many  features  hybrid  systems  share  with  time- 
invariant  systems.  Moreover  because  they  are  time-varying,  they  offer  many 
useful  features  due  to  their  variable  structure  property.  In  other  words, 
hybrid  systems  are  a  mixture  of  time-invariant  systems  with  which  they  share 
the  algebraic  and  geometric  structures  and  time-varying  systems  from  which 
they  bring  their  variable  structure  property  that  will  be  of  a  great  help  in 
controlling,  observing  and  stabilizing  them. 

The  class  of  hybrid  systems  considered  in  this  paper  are  assumed  to  have 
the  form 


dx/dt  =  A(r(t))x(t)  +  B(r(t))u(t) 


(1.1) 
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y(t)  =  C(r(t) )x(t) ; 


(1.2) 


where  x  is  the  plant  state  vector  of  dimension  n,  u  is  the  plant  control  input 
vector  of  dimension  p,  y  is  the  plant  output  vector  of  dimension  m,  and  r(t) 
is  the  "form  index"  which  is  a  deterministic  scalar  sequence  taking  values  in 
the  finite  index  set  N={1,  2,  . ...  N). 

This  type  of  model  can  be  used  to  represent  systems  subject  to  known 
abrupt  parameter  variations  such  as  commutated  networks  or  to  approximate 
certain  time-varying  systems.  This  can  be  done  by  imposing  a  "deterministic" 
switching  rule  on  the  time  behavior  of  the  form  index.  However,  to  model 
unknown  abrupt  phenomena  such  as  component  and  interconnection  failures  the 
form  index  can  be  modeled,  for  example,  as  a  finite-state  Markov  chain. 

The  latter  problem  has  received  considerable  attention  from  the  control 
community  but  many  important  generalizations  remain  to  be  worked  out.  Chizeck 
et  al  [3]  calls  such  a  control  problem  the  Jump  Linear  Quadratic  (JLQ)  problem 
since  they  view  it  as  an  extention  of  the  standard  Linear  Quadratic  (LQ) 
problem.  However,  very  little  attention  was  given  to  the  deterministic  version 
of  the  problem,  even  though  it  shares  many  features  with  the  JLQ  problem.  This 
paper  is  concerned  with  the  latter  problem. 

Let  denote  any  sequence  of  length  M  of  the  values  taken  by  r(t),  and 
let  dt^  denote  the  time  interval  during  which  r(t)=i.  Throughout  the  paper  the 
following  assumption  is  made,  that  Sjj  contains  all  the  values  that  r(t)  takes. 
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In  this  case  we  define 

N 

T  5  2  6ti 
i=l 

as  the  period  of  the  system.  If  in  addition  the  sequence  in  every  Sjj  is  the 
same  the  system  is  called  a  periodic  hybrid  system.  It  will  be  obvious  that 
making  M.>N  in  the  assumption  on  S^  will  not  affect  the  results.  The  (M=N)- 
assumption  makes  the  notation  less  cumbersome. 

The  ith  form  is  the  realization  S^=(A^,B^,C^)  associated  with  the  ith  form 
index  (i.e.,  r(t)=i),  with  ieN. 

The  following  is  an  outline  of  the  paper.  Section  2  discusses  the 
stability  of  hybrid  systems  where  a  simple  sufficient  stability  criterion  is 
derived.  The  observability  and  controllability  of  periodic  hybrid  systems  are 
treated  in  sections  3  and  A  respectively.  Algebraic  observability  and 
controllability  tests  are  obtained.  Section  5  extends  the  results  of  sections 
3  and  A  to  general  hybrid  systems .  In  section  6  the  stabilizability  of  hybrid 
systems  is  addressed  and  a  simple  example  is  used  for  illustration  purposes. 
Section  7  concludes  the  paper  and  points  to  possible  future  research 
directions. 

2.  STABILITY; 

Even  though  Hybrid  systems  are  time-varying  systems  it  is  possible  to 
obtain  necessary  and  sufficient  asymptotic  stability  conditions.  To  gain  some 
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familiarity  with  these  systems  we  start  by  studying  the  stability  of  periodic 
hybrid  systems.  To  this  end  we  recall  a  theorem  by  Willems  [5]  that  gives  a 
necessary  and  sufficient  conditions  under  which  piece-wise  constant  periodic 
systems  will  be  uniformly  asymptotically  stable. 

THEOREM  [5]: 


The  null  solution  of  the  periodic  hybrid  system  (piecewise  constant 
periodic  system)  (1)  is  uniformly  asymptotically  stable  iff  the  eigenvalues  of 
the  matrix 


1 

<p(t  +T,  t  )  =  n  ExpUiCdti))  (2) 

i=N 

have  magnitudes  less  than  1.  It  is  unstable  if  at  least  one  eigenvalue  of  this 
matrix  has  magnitude  greater  than  1. 

Basically  what  the  theorem  is  saying  is  that  for  the  system  to  be 
asymptotically  stable  its  transition  matrix  over  one  period  of  time  has  to  be 
a  contraction. 

It  is  obvious  how  to  modify  this  theorem  to  derive  a  similar  one  for 
hybrid  systems  which  are  not  necessarily  periodic.  However  the  theorem  will  be 
difficult  to  use,  since  one  has  to  compute  matrices  similar  to  (2)  N!  times 
and  check  their  eigenvalues. 


In  order  to  derive  simpler  conditions  to  test  for  the  stability  of  such 


systems,  a  different  norm  is  defined,  namely  the  logarithmic  norm.  The  result 
is  a  simpler  condition  that  is  only  sufficient. 

The  logarithmic  norm  (also  known  as  the  logarithmic  derivative,  the 
measure  of  a  matrix)  was  introduced  in  1958  separately  by  Dahlquist  [6]  and 
Lozinskij  [7]  as  a  tool  to  study  the  growth  of  solutions  to  ordinary 
differential  equations  and  the  error  growth  in  discretization  methods  for 
their  approximate  solution.  It  is  formally  defined  as  follows: 


DEFINITION: 


The  logarithmic  norm  associated  with  the  matrix  norm  II .  II  is  defined  by 


u(A)  =  lim  (III  +  hAII  -  l)/h. 
h-K)+ 


(3.1) 


Explicit  expression  for  the  logaritnmic  norm  associated  with  the  Euclidien 
norm  is 


u(A)  *  max{q  :  m  e  A((A+A  ) / 2 ) } . 


(3.2) 


Then  the  following  inequality  is  true: 


DExp(At)ll  S  Exp(y(A)t) 


(4) 


One  very  important  property  of  the  logarithmic  norm  follows  from  the  fact 


that  it  may  be  shown  to  be  the  smallest  element  of 


S  =  {x  :  flExp ( At ) II  £  Exp(xt),  t£0}. 


(5) 


Therefore  it  gives  an  optimal  bound  on  the  exponential  behavior  of  II Exp ( At ) II 
for  t20.  It  may  be  concluded  immediately  that 

supllExp(At)ll  =  1  iff  m(A)S0.  (6) 

t*0 

In  the  case  where  A  is  normal  square  matrix  (i.e.,  A5'=A),  then 

II  Exp  ( At )  II  =  Exp(a(A)t)  *  Exp(p(A)t),  (7) 

where  a(A)  is  the  spectral  radius,  i.e.  the  maximal  real  part  of  the 
eigenvalues  of  A. 

Now  we  are  ready  to  apply  the  logarithmic  norm  to  derive  a  simple 
sufficient  condition  to  test  for  the  stability  of  hybrid  systems. 

THEOREM  1; 

For  the  null  solution  of  the  hybrid  system  (1)  to  be  uniformly 
asymptotically  stable,  it  is  sufficient  to  have 

2  p(A^)6ti  <  0,  6t^  =  ti  -  t^-i,  ieN.  (8) 

i 

The  proof  is  a  simple  application  of  the  logarithmic  norm  to  Willems' s 
theorem.  It  is  important  to  know  that  the  above  theorem  is  stated  not  only  for 
periodic  hybrid  systems  but  it  applies  to  the  more  genaral  hybrid  systems  as 


defined  above  too. 
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OBSERVABILITY : 


Since  hybrid  systems  are  a  special  class  of  time-varying  systems  they 
display  interesting  properties  relative  to  controllability  and  observability. 
It  would  be  appropriate  to  define  the  latter  properties  while  keeping  in  mind 
the  fact  that  these  systems  are  variable  structure  systems.  We  start  with  the 
observability  criterion  because  it  is  simpler  to  prove.  Consequently,  the  dual 
controllability  criterion  is  stated  by  appealing  to  the  duality  principle. 


I 


Definition: 


A  periodic  hybrid  system  is  said  to  be  observable  if  there  exists  some 
finite  tf2t0+T  such  that  the  initial  state  x(t„)  of  the  unforced  system  can  be 
determined  from  the  knowledge  of  y(t)  on  [t0,tf]. 

Using  the  above  definition  it  is  possible  to  state  an  algebraic  necessary 
and  sufficient  observability  criterion  very  similar  to  the  usual  algebraic 
test.  Moreover  this  algebraic  test  is  expressed  as  a  function  of  the 
observability  matrices  of  the  different  forms.  This  condition  is  a 
generalization  of  the  well  known  algebraic  observability  test. 

THEOREM  2: 


A  periodic  N-form  hybrid  system  is  observable  iff  the 
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observability  matrix 


Oi 

02Exp(Ai(6t^)) 


Oj^ExpCAf^.^St^.^). .  .Exp(A^(6t^)) 


has  full  rank,  where  0^  is  the  observability  of  the  ith  form,  ieN. 


Proof : 


Let  us  assume  that  the  system  is  in  its  ith  form  at  time  telt^.t^+jJ  then 
the  output  is  given  by  the  following  expression 


y(t)  *  CiExp(Ai(t-ti))  II  Exp(Aj(6tj))x(t0) 

j=i-l 


taking  n-1  derivatives  of  y(t)  and  arranging  them  in  a  matrix  yields 


Y-^Ct)  *  (^Exp^Ct-t^))  A  Exp(Aj(5tj  ))x(t0) 

j=i-l 


where  0^  is  the  observability  matrix  of  the  ith  form.  Now  repeating  the  same 
procedure  for  all  ieN  and  stacking  the  Y^'s  starting  by  Yj  gives  the  following 


equation 


OiExp(Aj(t-t0)) 

02Exp(A2(t-t1))Exp(A16tj) 


0^Exp(AN(t-tN_i)). . .ExpCAjfitj) 


x( t  o ) .  (12) 


'  A.J 


I 

c 


I 
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Befor  proceeding  any  further  we  wouldlike  to  note  that  we  have  a  "free" 
parameter  t  for  every  Y^.  The  t  parameter  is  free  because  we  can  choose  it  any 
way  we  want  in  the  appropriate  interval.  As  it  will  be  seen  in  this  proof 


picking  t=tk»  k=0,  1,  . ..,N-1,  for  every  Y^  yields  a  simpler 
observability/ controllability  criteria.  That  is,  less  computation  is  needed  to 
apply  the  test.  Nevertheless,  picking  it  otherwise  will  be  of  use  as  discussed 
in  the  sequel. 


For  notational  convenience  (12)  can  be  written  in  the  following  compact 


Y  -  0x(0) 


and  the  question  is  whether  we  can  find  x(0).  Now  if  the  Nnmxn  matrix  0  has 


rank  less  than  n,  then  there  exists  a  linear  combination  of  the  matrix  0 


columns  adding  to  zero,  i.e. 


0  =*  Ox  *  Z  xicoliO. 


Therefore,  the  condition  that  0  has  full  rank  is  necessary  for  observabi  lity. 


To  prove  that  it  is  sufficient  we  first  multiply  (13)  on  both  sides  by  O' 
to  obtain 


O'Y  =  O'Ox(O). 


Now  if  0  has  rank  n,  it  is  equivalent  [Kailath]  to  say  that  O'O  is 
nonsingular,  wich  means  that  x(0)  can  be  directly  obtained  from  (15)  as 

x(0)  =  (0'0)-10'Y.  (16) 

This  is  the  only  solution  of  (15).  Moreover,  it  is  also  the  only  solution  of 
(13).  To  wit  assume  that  x^  and  X2  are  two  different  solutions  of  (13),  then 
we  shall  have 

0[xi  -  X2]  *  0 

which  means  that  some  combination  of  the  columns  of  0  is  zero,  which 
condradicts  the  assumption  that  0  has  full  rank.  Therefore  the  theorem  is 
proved. 

4.  CONTROLLABILITY: 

At  this  point  the  dual  algebraic  controllability  test  is  introduced.  First 
an  analog  definition  for  controllability  is  proposed  and  used  along  with  the 
algebraic  observability  test  to  prove  the  result  via  the  duality  principle. 

Definition: 

A  hybrid  system  is  said  to  be  state-controllable  if  for  any  t0  each  state 
x(t0)  can  be  transferred  to  any  final  state  x^  after  one  period.  Thus  there 
exists  a  tf,  t0+TStf<°o  such  that  x(tf)=Xf. 
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Before  presenting  any  new  algebraic  controllability  criterion  -the  dual  to 
the  observability  criterion  given  above-  the  usual  controllability  test  for 
time-varying  systems  is  used.  This  is  done  in  order  to  display  certain 
interesting  properties  of  hybrid  systems.  Computing  the  controllability 
grammian  and  using  the  fact  that  our  system  is  piece-wise  constant  yields  the 
following  theorem. 


THEOREM  3: 


A  periodic  hybrid  system  of  N  forms  is  controllable  iff 


N  t, 


W(t0ft0+T)  =  £  /  «J>i(t,t0)BiB^(t,t0)dx 

i=l  t-.j 


has  full  rank. 


COROLLARY: 


A  periodic  hybrid  system  is  completely  controllable  iff  it  is 
controllable. 


Proof : 


See  Remark  (2.18)  in  [14J ,  then  use  the  above  theorem. 


Befor  proceeding  any  further,  a  necessary  and  sufficient  condition  for  a 


periodic  hybrid  system  to  be  uniformly  completely  controllable  is  given.  This 


result  will  be  of  importance  when  stabilizability  of  such  systems  is  in 
question. 


THEOREM  4: 


A  periodic  hybrid  system  is  uniformly  completely  controllable  iff  it  is 
completely  controllable. 


Proof : 


If  the  periodic  system  is  completely  controllable,  there  must  exist  a 
finite  s2T  such  that  W(0,s)2el>0.  Therefore  the  result  is  proved  by  using  a 
result  from  Silverman  et  al  (Lemma  1)  [10]  and  Remark  (2.18)  in  Kalman  et  al 


Having  used  the  usual  test  we  are  ready  to  present  an  algebraic 
controllability  test  very  similar  to  the  one  used  in  linear  time- invariant 
theory.  The  following  criterion  applies  for  periodic  hybrid  systems.  An  analog 
criterion  for  hybrid  systems  will  be  introduced  later  in  this  paper. 


THEOREM  5: 


A  periodic  hybrid  system  of  N  forms  is  controllable  iff  the 
controllability  matrix 


[Cjj,  Exp(A^(  6tfj)  j ,  ....  Exp(  Ajg(  6tjg_  j  )  )  .  .  .  Exp(  A2(  6t2  )  )C^  ] 


(18) 


has  full  rank,  where  is  the  usual  controllability  matrix  of  the  ith  form, 
icN. 


Proof : 


Using  the  principle  of  duality  and  the  algebraic  observability  theorem 
presented  above  proves  the  theorem. 

For  computational  purposes,  it  is  better  to  rewrite  the  above 
controllability  matrix  as  follows 

[CN,  Exp(Af2(6tf,j)){Cf2_2  ,  . . .  { C4 ,  Exp(A3(6t3)){C2,  Exp(A2(6t2))C2)] •  (19) 

This  way  one  does  not  have  to  compute  all  of  the  matrices  needed  to  express 
(18)  and  compute  its  rank.  That  is  the  rank  is  checked  sequentially  and  (19) 
is  augmented  appropriately  until  full  rank  is  achieved.  If  not,  the  system  is 
not  controllable.  The  same  observation  applies  to  the  observability  criterion. 

Besides  the  above  algebraic  criteria  for  controllability  and 
observability,  we  are  ready  to  introduce  two  more  tests.  The  first  test  is  a 
very  simple  and  geometrically  and  computationally  attractive  necessary 
algebraic  test.  The  second  one  is  a  simple  algebraic  sufficient  condition. 
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A  necessary  algebraic  condition  for  a  hybrid  system  to  be  controllable  is 


rank[C^,  C2>  ....  C^]  =  rank  £  =  n.  (20) 


Where  is  the  controllability  matrix  for  the  ith  form,  ieN. 


Proof: 


We  write  for  the  state  at  s,  when  at  time  t0  the  system  is  in  zero  state, 


x(s)  =  /  <t>(s,x)B(x)dx. 


(21) 


Using  the  fact  that  the  system  is  piece-wise  constant  and  the  linearity 
property  of  the  integral  operator  yields  for  s=t^ 


x(tN)  =  Exp(Afj<5t[y}).  .  ,Exp(A2<5t2)/  Exp(Ai(ti-x))Biu(x)dx  . 

t0 


“N- 1 


+  Exp(ANat^)/  Exp(AN.1(tN-i-x))Bt>j_1u(x)dx 
fcN-2 


cN 


+  /  Exp(AN(tN-x))BNu(x)dx. 
tN-l 


(22) 


After  expanding  the  exponential  matrices  inside  every  integral,  it  is  found 
that  x(t^)  is  an  element  of  the  column  range  space  of  the  controllability 
matrix  C  given  in  the  theorem.  Moreover,  it  is  easy  to  see  that 


rank  £  £  rank  C  S  n 


an  inequality  that  dictates  that  full  rankness  of  £  is  a  necessary  condition 
for  our  system  to  be  controllable. 

The  above  proof  gives  an  alternate  way  to  prove  the  necessity  part  in 
theorem  5.  It  is  also  interesting  to  note  that  this  latter  test  is  independent 
of  the  E^'s  order.  This  order  independence  would  have  been  very  beneficial  if 
we  did  not  loose  it  in  the  sufficiency  part  of  the  proof. 

Now  we  state  a  theorem  that  gives  a  simple  sufficient  algebraic  test.  With 
the  above  simple  necessary  test  this  condition  will  provide  an  efficient 
algebraic  way  to  test  for  the  controllability/observability  of  hybrid  systems. 
This  theorem  is  adapted  from  a  theorem  given  in  111]. 

THEOREM  7: 


A  sufficient  condition  for  a  periodic  hybrid  system  to  be  controllable  is 


rank[B^,  Exp(A^dti )B2 ,  ....  Exp(A^fit^). . .Exp(A^6t^)B^]  =  rank  £  =  n.  (23) 


Proof : 


Since  £  has  full  rank  then  ££'>0,  i.e.  is  positive  definite.  Also 
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k 


N 

C(s1>S2> .  •  •  ,sN)C'  (SJ.S2,  • .  •  ,sN)  =  E  <&(sk,t0)BkBk<J>' (sk,t0).  (24) 

k=i 


Where  ske[ tk, tk_i ] .  Then 


fcN 

W(t0,tN)  -  /  $(s,t0)B(s)B'(s)$(s,0)ds 

to 

N  sk-° 

2  £  (±  /  $(s,t0)B(s)B'(s)<J>(s,t0)ds) 

k-1  sk 

*  oC(slfS2, • • • ,sN,t0)C' (s1,S2, • • • ,sN,t0)  +  o(o),  (25) 


for  o  sufficiently  small. 


If  we  assume  that  C  has  full  rank  then  for  o  small  enough  (25)  is  positive 
definite.  But  then  (25)  implies  that  W(tn,to)>0  and  the  theorem  is  proved. 


5.  APERIODIC  HYBRID  SYSTEMS: 


In  this  section  we  generalize  the  above  results  stated  for  periodic  hybrid 
systems  to  more  general  aperiodic  hybrid  systems.  Nevertheless,  many  of  the 
above  results  apply  to  hybrid  systems  (i.e.,  not  only  the  periodic  ones) 
without  modification.  Therefore  we  will  state  the  most  important  results  and 
leave  the  rest  to  the  interested  reader. 


THEOREM  8; 


A  hybrid  system  is  controllable  iff  theorem  5  holds  for  all  possible  N! 
permutations  of  the  form- index  set  N. 


THEOREM  9: 


In  case  theorem  7  holds  for  the  Nf  permutations  of  the  index-set  N  then 
the  hybrid  system  is  controllable. 

It  is  obvious  that  theorem  6  applies  for  general  hybrid  systems  too. 
Moreover,  we  think  that  the  theorem  is  very  "close"  to  being  a  sufficient 
condition  too.  A  heuristic  argument  can  be  given  as  follows:  Since  any  matrix 
exponential  is  a  perturbation  of  the  identity  matrix  it  follows  that 
multiplying  any  matrix  with  matrix  exponentials  will  not  change  its  range 
space  drastically.  That  is  if,  for  example,  and  C2  have  algebraic 

complementery  range  spaces  (i.e,  range(Cj)  is  perpendicular  to  range(C2))  then 
range(Exp(AT)C^)  will  almost  always  remain  an  algebraic  complement  but  not 
necessarely  perpendicular  to  range(C2).  As  a  matter  of  fact,  Mariton  [12] 
states  that  he  has  proved  that  theorem  6  is  also  a  sufficient  condition. 

6 .  STABILIZABILITY : 


This  section  presents  some  results  concerning  the  control  and 
stabilization  of  hybrid  systems.  These  results  use  off-the-shelf  techniques  to 
control/stabilize  hybrid  systems. 


6. 1  Stabilizability: 


We  would  like  to  mention  the  work  of  Ikeda  et  al.  [13].  In  their  work 
they  looked  at  the  relation  between  controllability  properties  of  the  system 


and  various  degrees  of  stability  of  the  closed  loop  system  resulting  from 
linear  feedback  of  the  state  variables. 

Their  results  are  as  follows:  For  any  initial  time  t0>  and  any  continuous 
and  monotonically  nondecreasing  function  6(.,t0)  such  that  <5(t0,t0)=0,  the 
transition  matrix  $(.,.)  of  the  closed  loop  system  can  be  made  such  that 
ll<D(t,t0)ll£a(t0)Exp{-6(t,t0)}  for  all  t2t0,  iff  the  system  is  completely 
controllable.  Furthermore,  in  case  of  a  bounded  system,  for  any  m£0,  a  bounded 
feedback  matrix  can  be  found  such  that  ll<t>(  t2 ,  t2  )  ll£aExp{  -mCtj-ti )}  for  all  t  x , 
t2St1,  iff  the  system  is  uniformly  completely  controllable.  Thus,  their 
results  can  be  regarded,  in  some  sense,  as  extensions  of  the  well  known 
results  of  closed  loop  pole  assignment  for  time -invariant  systems. 

Therefore  there  is  a  high  degree  of  flexibility  in  the  stabilization  of 
hybrid  systems  if  they  are  either  completely  controllable  or  uniformly 
completely  controllable. 

As  an  illustration  of  the  above  result,  a  recipe  is  proposed  to  stabilize 
a  periodic  hybrid  system  via  state  feedback  when  all  of  the  forms  are  minimal. 
This  design  procedure  allows  the  designer  to  impose  or  choose  an  upper  bound 
on  the  norm  of  the  transition  matrix  of  the  hybrid  system  to  be  stabilized. 
Thus  the  norm  of  the  transition  matrix  for  hybrid  systems  plays  a  role  similar 
to  the  maximum  overshoot  and  time  constants  in  linear  time-invariant  systems. 

To  impose  an  upper  bound  on  the  norm  of  the  transition  matrix  a  known 
stability  criterion  [5]  is  used:  The  null  solution  of  (1)  is  uniformly 


asymptotically  stable  iff  there  exists  two  positive  constant  numbers  c^  and  co 
such  that 

t , t o )  II  S  c1Exp( -C2(t-t0  ))  (26) 

for  all  t£0.  Therefore  using  theorem  1  leads  to  the  following  design  criterion 

2  n(Ai)6ti  S  k]_  -  k2T  (27) 

i 

where  k^=ln(c^)  and  T  the  period  of  the  hybrid  system.  The  k^'s,  i=l,  2,  are 
the  design  parameters  that  the  designer  chooses  according  to  his 
specifications  to  make  the  upper  bound  of  the  transition  matrix  of  the  system 
and  consequently  the  time  response  of  the  plant  to  be  as  desired.  This  is 
possible  because  every  form  is  controllable,  therefore  the  closed-loop  poles 
of  each  form  can  be  assigned  arbitrarily.  Consequently,  (27)  can  be  always 
obtained  via  state  feedback  since  every  form  is  observable.  It  is  important  to 
note  that  this  design  procedure  applies  to  periodic  hybrid  systems  and  hybrid 
systems  as  well.  It  is  encouraging  to  remember  that  the  minimality  condition 
for  every  form  is  not  necessary  to  achieve  such  design. 

Before  closing  this  section,  we  would  like  to  mention  another  way  to  control 
uniformely  compretly  controllable  hybrid  systems.  This  technique^  is  due  to 
Kalman  [9  ].  Kalman  showed  that  by  using  the  mathematical  concept  of  the 
gf  .eralized  inverse  of  a  matrix  introduced  by  Penrose  it  is  possible  to  define 
a  suitable  control  that  will  accomplish  the  desired  transfer.  Moreover,  he  was 

^  It  is  interesting  to  note  that  this  technique  never  made  it  into 
standard  optimal  control  text  books. 


able  to  prove  that  the  proposed  control  is  the  minimum  energy  control  required 
to  accomplish  the  transfer. 

6.2  Example: 

The  example  is  a  2-form  periodic  hybrid  system.  Form  number  one  and 
two  are  respectively  described  with  the  following  dynamics 

Zj :  dxj/dt  =  X}  +  u 

dx2/dt  =  X2, 

%2l  dx^/dt  * 

dx2/dt  =  X2  +  u. 

It  is  clear  that  both  forms  are  unstable.  Since  both  forms  are  diagonal 
their  transition  matrices  are  simple  to  compute  and  the  transition  matrix  of 
the  hybrid  system  is  found  to  be  unstable  too. 

Our  goal  is  to  stabilize  this  hybrid  system.  It  is  obvious  that  both  forms 
are  uncontrollable  but  the  hybrid  system  is  controllable.  The  controllability 
of  the  system  is  easely  checked  by  any  of  the  controllability  tests  introduced 
in  this  paper. 

To  stabilize  the  system  we  use  Kalman's  technique  [14]  to  control  the 
controllable  subspace  of  each  form.  Starting  at  time  zero  is  on  and  remains 
so  for  T^  time  units.  It  is  easy  to  check  that 


u(t)  =  2{Exp(-t)}/{l-Exp(-2T1)}x1(0),  te[0,Ti] 


is  an  optimum  open-loop  control  action  that  will  take  x^(0)  to  zero  in  time 
units.  A  similar  control  action  will  take  ^(T].)  to  zer0  in  ^2  time  units. 


Therefore,  steering  the  system  to  the  origine  was  accomplished  in  one 
period.  This  control  action  resembles  "dead-beat"  control. 


7.  CONCLUSION: 


In  this  paper  a  special  class  of  linear  piece-wise  constant  time-varying 
systems  was  introduced.  These  systems  are  called  hybrid  systems  because  the 
set  of  linear  time- invariant  systems  among  which  the  systems  are  switching  is 
finite. 

Because  hybrid  systems  share  several  features  with  linear  time- invariant 
systems  it  was  possible  to  derive  the  following  results:  1.  A  necessary  and 
sufficient  stability  condition  and  a  simple  sufficient  criterion.  2.  Algebraic 
necessary  and  sufficient  controllability/observability  tests  analog  to  the 
usual  tests.  3.  A  very  interesting  necessary  controllability/observability 
condition  which  is  "almost"  sufficient  along  with  a  simple  sufficient 
condition. 

The  necessary  controllability/observability  condition  is  a  flat  block 
matrix  composed  by  the  controllability/observability  matrices  of  every  form 
which  makes  it  independent  of  the  switching  order.  This  order  independence 
along  with  the  fact  that  the  condition  is  "almost"  sufficient  make  it  a  very 
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useful  test.  Therefore  identifying  the  class  of  hybrid  systems  for  which  this 
condition  is  necessary  and  sufficient  would  be  an  interesting  research 
problem. 


State  feedback  via  switching  or  nonswitching  gains  is  an  interesting  topic 
that  needs  investigation.  Nonswitching  gains  are  very  useful  since  they 
eliminate  the  need  for  form-detection. 


Much  more  has  to  be  done  concerning  stability  theory  of  this  class  of 
systems.  The  variable  structure  property  seems  to  be  a  promising  feature  in 
this  direction.  In  addition  if  one  thinks  of  every  system  E^CA^.B^Ci)  with 
ieN  as  an  operator  acting  on  the  state  x  during  St^,  and  these  operators  are 
applied  in  a  successive  manner,  then  this  process  can  be  viewed  as  an 
iterative  process  [4],  Viewing  a  hybrid  system  as  an  iterative  process  sheds 
some  light  on  some  complicated  issues  such  as  the  stability  of  such  systems. 


Moreover,  if  we  discretize  our  continuous -time  hybrid  system  with  samples 
happening  at  the  discontinuities  we  come  up  with  what  we  call  the  induced 
discrete-time  hybrid  system.  Using  the  induced  discrete-time  model  one  can  use 
the  discrete-time  LQ-theory  to  control/stabilize  such  systems.  This  remark 
implies  that  we  probably  only  need  to  study  discrete-time  hybrid  systems  -  At 
this  point  we  woul  dlike  to  mention  the  work  of  Ludyk  [15]  where  the  author 
tries  to  solve  the  problem  of  eigenvalue  assignment  for  time-varying  discrete¬ 
time  systems  following  the  path  of  Wollovich  [16].  Applying  these  techniques 
might  give  us  more  understanding  of  the  control/stabilization  of  hybrid 
systems. 


Finally  adapting  the  results  of  this  paper  to  hybrid  systems  where  the 
switching  is  a  stochastic  process  such  as  a  Markov  chain  might  be  of  great 


usefulness . 
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ABSTRACT 

Stochastic  differential  equations  for  the  conditional  density  function 
and  moments  are  presented  for  a  linear  system  which  is  excited  by  a  marked 
Poisson  process  whose  rate  depends  on  the  state  of  the  system  and  which  is 
observed  in  white  Gaussian  noise.  The  set  of  optimal  filtering  equations 
is  infinite  dimensional,  therefore,  any  practical  filter  is  suboptimal.  A 
suboptimal  filter  is  developed  for  the  case  of  unmarked  Poisson  excitation. 
This  suboptimal  filter  estimates  the  Poisson  process  via  a  combined  sequen¬ 
tial  estimation  and  detection  scheme  based  on  the  criterion  of  maximum 
a  posteriori  (MAP)  probability.  An  example  computation  is  presented. 

1 .  INTRODUCTION 

This  paper  examines  the  issue  of  state  estimation  for  a  linear  system 
which  is  driven  by  a  Poisson  process  whose  rate  parameter  depends  on  the 
state  of  the  system.  The  input  process  is  described  as  "self-excited" 
since  its  rate  function  can  be  specified  given  the  past  history  of  the 
input  process. 

The  model  of  a  dynamic  system  driven  by  a  Poisson  process  with  a  state 
dependent  rate  is  motivated  by  several  practical  situations.  In  aircraft 
maneuvers,  the  pilot's  discrete  application  of  controls  is  sometimes 
modeled  as  a  Poisson  input  process.  It  is  reasonable  to  expect  that  the 
rate  of  the  control  actions  is  dependent  on  the  state  of  the  aircraft. 
Another  example  is  the  tracking  of  a  light  source  with  a  photon  detector. 
The  rate  of  photon  arrivals  certainly  depends  on  the  state  of  the  tracking 
system,  notably  the  tracking  error  angle. 

The  most  general  system  considered  in  this  paper  is  described  by  the 
following  scalar  equations: 

dxfc  ■  afcxtdt  +  btdr,t  (1) 


ctxt  + 


where  n  is  a  marked  Poisson  process  whose  marks  (i.e.,  the  amplitudes  of 
the  jumps)  {u^}  are  *  sequence  of  mutually  independent,  identically 
distributed  random  variables  with  density  py(u).  The  incident  rate 
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of  n  is  a  memoryless  function  of  the  state,  ufx^).  The  process  wfc  is  a 
Brownian  motion  with  diffusion  Vfc. 

The  objective  is  to  estimate  xfc  given  the  history  of  the  observation 
process,  either  yg  or  zg,  for  s  <  t.  In  Section  2,  an  expression  for  the 
minimium  mean-squared  error  (MMSE)  estimate  is  derived,  and  shown  to  be 
impractical.  Good  suboptimal  approximations  to  the  MMSE  estimate  are 
desirable,  but  are  not  pursued  here.  Instead,  in  Section  3,  the  maximum 
a  posteriori  (MAP)  criterion  is  used  to  derive  a  practical  filter  for  xfc. 


2.  OPTIMAL  FILTER  EQUATIONS 

This  section  derives  the  expression  for  the  stochastic  partial  differ¬ 
ential  equation  satisfied  by  pt|t(x),  the  conditional  density  function  of 
xt  given  Zfc  =  {zg;s  <  t},  based  on  a  filtering  theorem  for  white  Gaussian 
observation  noise.  Furthermore,  recursive  equations  are  obtained  for  the 
central  moments  of  this  density  function.  The  procedure  used  here  is 
similar  to  the  one  used  by  Kwakernaak  [1]  to  analyze  a  linear  time 
invariant  (LTI)  system  driven  by  an  unmarked  Poisson  process  with  a 
constant  rate. 

First,  the  filtering  theorem  stated  in  Kwakernaak  [1]  is  summarized 
for  the  special  case  of  a  scalar  system  with  independent  observation  noise. 

Filtering  Theorem  [1]:  Let  Qt,  t  >  tQ,  be  the  semi-martingale  defined 
by 


dQt  “  Rtdt  +  ^t 


t  >  t 


(3) 


where  Mfc  is  a  martingale  with  respect  to  a  growing  family  of  c-fields  Ffc, 
t  >  tQ,  and  where  Rfc  is  a  process  adapted  to  F.  Let  zt,  t  >  tQ,  be  the 
semi-martingale  process 


dz 


hfcdt  +  dw^ 


t  >  t 


(4) 


where  h  is  another  process  adapted  to  F,  and  wt  is  a  Brownian  motion  inde¬ 
pendent  of  F,  such  that  E(dw2)  -  Vfcdt,  Vt  >  0  for  t  >  tQ.  Define  Zt  as  the 
growing  family  of  c-fields  generated  by  the  process  zfc.  For  an  arbitrary 

A  ^  A 

process  define  5^  ■  E(?tIZt).  Then  satisfies  the  dynamic  equation 


dQt  -  Rfcdt  +  [Qthfc  -  Qtht]v"  [ dz t  -  hfcdt]  . 


(5) 
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The  filtering  theorem  will  be  applied  to  Q  ■  e  t,  for  xfc  as  defined  in 
(1).  However,  the  differential  rule  for  filtered  Poisson  processes  must 
first  be  used  to  obtain  dQt.  The  rule  may  be  found  in  Snyder  [2,  p.  200], 
and  is  also  a  special  case  of  the  differential  rule  for  discontinuous  semi¬ 
martingales  [1,3]. 

Differential  Rule  [2]:  For  an  appropriately  smooth  function  Q(xfc)  and 
for  xfc  defined  in  (1),  the  rule  is 

3Q(X  ) 

dQ(xfc)  ■  afcxt( — ^ — )dt  +  /  [Q(xt+bfcu)  -  Q  (xfc )  ]  K  (dt.du)  (6) 


where  the  last  integral  is  a  counting  integral  [2,  p.  195],  evaluated  over 
the  mark  space  0,  with  respect  to  the  Poisson  counting  measure  K(dt,du). 
K  (At, A)  is  the  number  of  jumps  of  n  during  the  interval  At  with  marks  in 
the  set  A  C  U. 

Equation  (6)  may  be  put  in  the  form  of  (3)  by  letting 


dMfc  -  /  [ Q (x^+b^u)  -  Q(xt> ][K{dt,du)  -  U (xfc)pu  (u)dtdn] 


(7) 


and  taking  Rfcdt  as  the  remainder.  The  substitution  of  Rfc  into  (5)  yields 

'''ivx>  '^Tvx*  -'Tvx^  ivb^u 

de  *  (ivafcxte  +  e  [e  -  l]u  (xfc))dt 


ivx 


jvx 


+  Kv  "  cxte  tdzt  ■  ctxtdtl  • 


(8) 


Let  8  ■  bfcu  (recall  u  is  the  mark  variable)  and  p0  (• )  be  the  probability 

density  function  for  9 If  it  is  assumed  that  t*he  conditional  density 
function  Pt|t(x)  exists,  then  taking  the  inverse  Fourier  transform  of  each 
term  of  (8)  yields 


dpt|t(x)  "  Lpt| t (X)dt  +  Vt1ct{x'Xtlptlt(X)[dzt  "  CtXtdtl 


(9) 


where  L  is  the  linear  operator  given  by 


Lp (x)  -  -  [afcxp(x) ]  +  (p0  (x)  *  [u  (x)p(x)] )  -  v  (x)p(x)  (10) 
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where  denotes  convolution.  As  in  Kwakernaak’s  case,  equation  (9)  is 

the  same  as  the  Kushner  equation  for  systems  driven  by  Brownian  motion, 
except  for  the  definition  of  L. 

Equations  (9)  and  (10)  can  be  used  to  derive  stochastic  differ¬ 
ential  equations  for  x  and  the  nfc^  conditional  central  moments  P  t  * 

n  ^  ' 

e[ (xfc-xt )  I  Zfc] as  follows: 


p„,t  -  *k-vXi 


n  ■  1r2, ••• 


<t  «  atxtdt  +  bfcE(u)U (xfc)dt  +  vt ^ctP2  t[dzt  "  ctxtdt^ 


„.t  *  nV„,tdt  *  J,  {")»*E,okHxt-xtln-Kv(xtldt  -  nbtt(U)i,(,t)Pn.,it< 
*  ‘  nP2,tPn-1,t1['tat  '  ctxtatl 


+  nv'Vp,  .  P_  .  p  ,  .  -  P  .  ldt 


n  *  2,3, , 


Equations  (11)  and  (12)  represent  an  infinite  set  of  coupled  stochas¬ 
tic  differential  equations.  Thus,  an  exact  mean-squared  error  optimal 
filter  is  impossible  to  implement.  Furthermore,  in  Kwakernaak’s  opinion, 
simple  truncation  of  the  moment  equations  (for  the  constant  rate  case) 
leads  to  unstable  filters  and  generally  poor  results.  Hence,  approximate 
suboptimal  filtering  techniques  are  required,  and  are  under  investigation. 
This  paper  considers  an  alternative  approach  which  uses  a  different  error 
criterion,  and  is  treated  in  the  next  section. 

3.  A  MAP  APPROACH 

For  this  analysis,  it  is  assumed  that  the  driving  process  n  is  a 
counting  process,  i.e.,  it  has  only  unit  jumps.  Furthermore,  it  is  assumed 
that  the  system  being  driven  is  linear  time-invariant,  that  is,  afc  *  a  and 
bfc  *>  b  in  equation  (1).  Thus,  it  is  clear  that  knowledge  of  the  jump  times 
implies  knowledge  of  xfc.  The  approach  followed  in  this  section  is  to 
obtain  MAP  estimates  of  the  number  NT  of  jumps  in  n  and  the  jump  times 
t  ■  I-  T  T  1  on  the  interval  (0,T),  given  the  observations  Y_  » 

“N„  lVT2 . VJ  T 


l/i 

ft: 

(K 


/o'.v.v;.4 


s 


{ y s ; s  <  t)  .  The  state  estimate  at  time  T,  denoted  xT,  is  then  constructed 
by  the  appropriate  superposition  of  impulse  responses.  The  approach  is 
made  into  a  practical  sequential  algorithm  by  using  time  discretization  and 
a  finite  time  window. 

This  is  an  extension  of  the  work  of  Au  and  Haddad  [3]  wherein  the 
approach  outlined  above  was  taken  for  marked  Poisson  driving  processes 
which  have  constant  known  rates. 

The  MAP  estimates  N  and  t  satisfy 
T  — ri 

^  (13) 

where  the  argument  of  the  logarithm  is  a  joint  a  posteriori  probability 
density  function.  M  is  an  integer  chosen  large  enough  so  that  Pr [NT  >  M] 
is  negligible.  The  condition  N^  <  M  ensures  that  includes  enough  jump 
times  to  construct  x^. 

The  log  of  the  density  function  in  (13)  can  be  replaced  by  the 
following  expression  without  changing  the  result: 

T  N*  N* 

£"  2r/  [2yfc  -  l  h(t,T*)][  l  b(t,i*)]dt 

to  i-o^  j-o  J 

+  ln  PT  IN  N  <M^|NT  '  N*'NT  <  +  ln  M"T  ■  N*|NT  4  •  (14) 

-M  T'  T 


<W 


Max 

arg  |  0  <  N*  <  M 

T*eR. 


T  -M  T  T 


The  first  term  is  recognized  as  the  log  likelihood  function,  wherein 
h(t,t*)  represents  the  response  of  the  system  at  time  t  to  an  impulse  at 
time  t*.  For  brevity,  h(t,i*)  is  defined  as  the  unforced  response  due  to  a 
known  initial  condition  xQ. 

The  next  objective  is  to  simplify  the  expressions  of  the  second  and 
third  terms  of  (14).  Note  that  the  event  N^  »  N*  is  also  the  event  t < 

T  <  Therefore,  the  probability  density  in  the  second  term  can  be 

rewritten  as 


Int,nt<m^|nt 


n*,nt  <  m)  - 


Pi(,VM^l*T  <  for  0  <  x*  <  ...  <  t*4  <  T 

<  T,V+1  >  TIN*  <  N]  and  T  <  T*.+,  <  t*  (15) 

0  otherwise 

Since  in  0  ■  it  is  reasonable  to  restrict  the  region  over  which  the 
expression  m  (13)  is  maximized  to  the  region  of  support  of  (15).  Under 


1 

I 


1 
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this  restriction,  the  third  term  of  (14)  cancels  the  denominator  of  the 
nonzero  part  of  (15). 

The  remaining  term  to  simplify  is  the  numerator  of  the  nonzero  part  of 

(15).  It  is  noted  that  the  event  N_  <  M  is  also  the  event  t  >  T.  Thus, 

A  M+ 1 

the  term  of  interest  may  be  expressed  as  a  marginal  density  function: 

‘ M) '  l  p.w  wT(:WT“*,  '  T)a,»*’  •  "6’ 

It  is  noted  that  the  region  of  support  of  the  integrand  is  over  the  "wedge" 

0  <  r  <  ...  <  t  <  t  minus  the  half  space  t  <  T.  Therefore,  (16) 
7  M  M+ 1  M+  1 

can  be  rewritten  as: 

PV  Vm(^'"t  ‘  B)  '  Hax(Jj.T)  P[  <V.  >  r>  a’“*'  1,71 

The  unconditional  density  in  the  integrand  of  (17)  is  a  special  case  of  the 
density  considered  by  Snyder  [2,  p.  248]  for  a  self-exciting  point  process. 
For  this  special  case,  the  density  can  be  expressed  as: 


^  1  ^  A 

n  aT?  ~exp  /  -4xt(T*_.)]dt  for  0  <  t*  < 

1*1  1  t*  , 


4  tm+i 


otherwise 


*  *0«*  U1 (t)  +  b  exp[a(t-T*)]u,(t  -  t*) 

where  u^  is  the  unit  step  function.  In  words,  xt(jL*  *s  t^ie  value  °f  the 
state  assuming  that  n  has  had  jumps  only  at  times  Let  x^T^) 

denote  the  unforced  value  of  the  state. 

Substitution  of  (18)  into  (17)  is  straightforward  due  to  the  product 
form  of  (18)  and  yields 

M3x[tm't)  _ 

Pt  |N<M^|NT  "  "  Pr7t~~  >  T~  eXP  (19) 

—rl  T  M  T  * 


where  it  has  been  assumed  that  there  exists  some  a  >  0  such  that  u  [x]  >  a 


for  all  x,  thus  making 


exp  /  -  u  [sTt  ]  d  t 


It  is  noted  that  p  (•)  is  defined  by  replacing  M  +  1  by  M  in  (18), 

^  u 


— M 

Evaluation  of  the  derivatives  yields: 
M 


IT* 

11  1*[\*(Li_1)]eXP  /  "  U  Cl*_  -,  )  ]  dt  fOC  0  <  T*  <  ...  <  T* 

i*1  1  Ti-1 

1  (20) 

0  otherwise 

The  combination  of  equations  (14),  (15),  (19),  and  (20)  results  in  a  new 

HAP  equation: 

(v^  - 


arg 


Max 

0  <  N*  <  M 


Max 


T*eR,  .  x  _  _ 

*  <  ...  <  T*,  <  T  2V^  £  ^2yt  ”  Xt^M^Xt^^dt 

T*,+1  <  ...  4  T* 

/  M  \  Max(TA'T) 

ini  n  u[x  (t*  )]  )  -  /  u[^t(l^)]dt 

Vi-1  i  /  0 


M 


(21) 


J  I 

where  the  maximization  is  to  be  performed  in  two  steps,  first  over  the 

t*'s  for  fixed  N*,  and  second  over  the  N**s. 

“fi 


4.  SEQUENTIAL  MAP  APPROXIMATION 

The  MAP  equation  (21)  derived  in  the  previous  section  is  now  approxi¬ 
mated  as  a  sequential  algorithm.  In  this  approximation,  the  observations 
are  processed  in  subintervals  each  of  length  A,  which  is  chosen  such  that 
the  probability  of  having  two  or  more  jumps  in  each  interval  is  negligibly 
small.  Each  subinterval  of  observations  is  used  to  detect  a  jump  in  the 
subinterval  and  to  estimate  the  jump  time,  as  well  as  to  update  the 
estimates  of  past  jump  times. 

In  order  to  reduce  computational  complexity  of  the  algorithm, 
estimates  further  than  L  subintervals  away  from  the  new  subinterval  are  not 
updated  and  considered  "finalized."  The  selection  of  L  represents  a 
tradeoff  between  performance  and  complexity.  Thus,  observations  in  the 
kth  subinterval  [(K-1)A,KA)  are  used  to  update  estimates  in  the  "window" 
f(K-L)A,KA).  N  represents  the  number  of  finalized  estimates  of  jump 
times. 

Equation  (21)  is  next  modified  so  that  maximization  is  performed  only 
over  jump  times  occurring  after  the  time  (K-L)A.  Any  additive  terms  which 
depend  solely  on  finalized  estimates  are  dropped.  For  brevity,  let  N  - 
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N  ("F*  for  finalized).  Furthermore,  redefine  t,  as  [x~  ], 

( ft  “L»  )  A  l_  Li  +  I 

and  redefine  x  (x*)  as  the  state  assuming  that  jumps  have  occurred  only  at 
the  finalized  times  and  at  the  proposed  times  x*.  The  modified  (approxi- 

■“Tj 

mate)  version  of  (21)  is: 


*  arg  )  ~ 
N 


Max 

<  N*  <  N. 


+  L 


tn  4  TN  +1 
F  F 


Max 
<  .. 


^  4  tn*+i  4 


<  TN*  < 


<  tnf+l 


KA 

/ 


(K-L) A 


x(l£)][x(T*)]dt 


/  VL 

+  M  / 

\i»N  +1 
F 


Max(Tj  +l,KA) 
F 

/ 

(K-L)A 


u[ xfc(l^) ] dt 


(22) 


There  is  a  remaining  difficulty  with  the  maximization  over  the  x*‘s  in 
(22).  Assume  that  this  maximization  is  being  performed  for  a  given,  fixed 
N*.  Furthermore,  assume  a  discretized  domain,  i.e.,  a  subset  of  equally 
spaced  discrete  values  in  R1*.  The  discretization  implies  that  the  expres¬ 
sion  in  (22)  is  evaluated  over  a  finite  number  of  values  for  the  t*'s 
between  x~  and  KA,  but  there  are  still  an  infinite  number  of  values  to 

N 

F 

check  for  the  x*'s  above  KA.  Maximizing  over  these  "future"  jump  times  is 
equivalent  to  maximizing  the  joint  a  priori  probability  density  for  these 
jump  times. 

The  constant  rate  case  (u[x£]  ■  presents  no  difficulty,  because 

the  joint  a  priori  density  function  for  the  jump  times  after  KA  has  its 

* 

maximum  at  x  *  *  x*  x~  T  ■  KA.  It  is  easily  shown  that  the 

N*  +  l  N*+2  N_  +L 

F 

same  is  true  for  stable  first  order  systems  and  rate  functions  u  [x]  which 
monotonically  increase  with  Ixl.  However,  for  more  general  LTI  systems  and 
rate  functions,  finding  the  maximum  of  the  a  priori  joint  density  is 
apparently  not  as  easy.  This  matter  is  currently  under  investigation. 


5.  EXAMPLE 

Figures  1  and  2  display  simulation  results  based  on  the  algorithm  of 

Section  4.  The  parameters  are  (see  equations  1  and  2)  at  ■  -5,  bfc  *  2,  and 

c,  *  1.  The  rate  or  intensity,  u  (x  ),  of  the  counting  process  n  ,  takes 
c  t  t 

only  two  values:  u(xt>  ■  2  for  I |  <  1  and  utx^)  ■  4  for  I xfc I  >  1. 
Figure  1  contains  the  state  trajectory.  The  rate  takes  its  high  value  when 
the  trajectory  is  above  the  dashed  line  and  the  low  value  otherwise. 
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For  estimation,  4  =*  0.03125  sec.  This  yields  an  approximate  upper 

bound  for  Pr[f|t+^  ”  1  >  l]  of  44  =  0.  125.  The  observation  noise  samples 

have  a  standard  deviation  (/V  )  of  0.15.  The  estimation/detection  window 

is  L  =  4.  Estimation  results  are  shown  in  figure  2.  Some  errors  may  be 

observed  at  t  ■  2  and  3  <  t  <  4.  It  is  noted  that  for  /V  »  0.1,  all  of 

T 

the  jumps  were  correctly  detected  (to  the  order  of  the  simulation  sample 
period)  and  for  /V  ■  0.2,  several  more  false  detections  occurred  in  the 
region  0.5  <  t  <  1.5. 

i 

6.  CONCLUSIONS 

The  state  estimation  problem  has  been  considered  for  a  linear  system 
observed  in  additive  white  Gaussian  noise,  where  the  system  is  driven  by  a 
Poisson  process  with  a  state  dependent  rate.  It  is  no  surprise  that  the 
minimum  mean-squared  estimator  is  infinite  dimensional,  since  the  same  is 
true  for  the  simpler  constant  rate  case.  However,  it  is  expected  that  the 
form  of  the  equations  will  suggest  a  good  suboptimal  approximation  in  the 
future.  An  implementable  estimator  was  developed  based  on  maximum 
a  posteriori  (MAP)  estimates  of  the  number  and  times  of  the  jumps  in  the 
driving  process.  However,  the  feasibility  of  this  scheme  has  been  shown 
only  for  certain  LTI  systems  and  rate  functions.  Further  investigation 
is  needed  to  enlarge  the  apparently  limited  applicability  of  this  MAP 
approach. 
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lure  1.  Example  state  trajectory 
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Figure  2.  Estimate  output 


